[Colloquium] Matt Baughman Candidacy Exam/Mar 12, 2025

via Colloquium colloquium at mailman.cs.uchicago.edu
Mon Mar 3 16:11:10 CST 2025


This is an announcement of Matt Baughman's Candidacy Exam.
===============================================
Candidate: Matt Baughman

Date: Wednesday, March 12, 2025

Time:  2:30 pm CST

Location: JCL 298

Remote: https://urldefense.com/v3/__https://uchicago.zoom.us/j/8998852780?pwd=M2dZVnlGQVYrc2FIbXNJdURpQXJjQT09__;!!BpyFHLRN4TMTrA!56h8C-5zHF5o7mYlqdD2DCi6R-e_ecYuGpvYFoVhtLIIO_MqJvyEz_HJ437WHcOPRU9itEg4CVRI6l8_2N_K1ppa$

Title: Corralling the Computing Continuum: Mobilizing Modern Distributed Resources using Adaptive Task Management

Abstract: Compute is the currency of the future. Right now, there is no unified way to access that currency. The computing continuum describes the emergence of global compute infrastructure from a mesh of devices connected by increasingly high bandwidth networks. To mobilize that infrastructure, we need to create a system that ties these diverse resources together—we need to corral the computing continuum. We propose a system that acts as a universal task manager, routing each incoming task for execution amongst the many resources that comprise the continuum. We propose the Adaptive Task Management (ATM) framework to do just this. ATM is designed on top of the Globus Compute framework, using existing infrastructure from edge devices to batch-scheduled HPC systems. Up to this point, ATM development has focused on a flat hierarchy with client-centered task scheduling and dispatch onto externally managed resources. We now propose a reformulation of ATM in line with the principles of the computing continuum. By leveraging burstable compute on the cloud and building virtual network topologies from the existing network connections between all devices, we intend to decentralize control of ATM and treat the network as a mesh rather than a hierarchy. From that mesh, we can cluster resources for each compute task, creating ad hoc distributed systems and, to prevent bottlenecks, we can burst out to the cloud for its compute and network offerings. Finally, by deploying an ATM client on each endpoint in the system, we abstract away many limitations of having one client control all endpoints. Instead, this paradigm allows each client to agnostically receive and dispatch tasks, enabling superior scalability, composability, and flexibility.

Advisors: Ian Foster and Kyle Chard

Committee: Omer Rana, Kyle Chard, Ian Foster


More information about the Colloquium mailing list