[CS] Rajini Shakya Wijayawardana MS Presentation Feb 19, 2025
via cs
cs at mailman.cs.uchicago.edu
Mon Feb 10 13:19:18 CST 2025
This is an announcement of Rajini Shakya Wijayawardana's MS Presentation
===============================================
Candidate: Rajini Shakya Wijayawardana
Date: Wednesday, February 19, 2025
Time: 9 am CST
Remote Location: https://www.google.com/url?q=https://uchicago.zoom.us/j/98043780497?pwd%3Dq8HZxgSLa1yjcSNH8xfbIshVAHy4Kh.1&sa=D&source=calendar&ust=1738099296243026&usg=AOvVaw383t-7mKPMwEG_uoTIuEQV
Location: JCL 298
Title: Tolerating Capacity Variation: Cloud Resource Management to Avoid Terminations
Abstract: The growth of cloud data centers has raised significant sustainability concerns and led to a power grid crisis. In this landscape, data center capacity is increasingly determined by external constraints, including power availability, carbon intensity and power prices.
We characterize the impact of variable capacity on cloud workload performance by evaluating commercial and synthetic cloud workloads under traditional scheduling. With capacity change frequency of 0.25--8 per hour, Microsoft's Azure and Google's Borg cloud workloads can suffer goodput losses of 12--24% and 5--19%. Generalizing, we study synthetic heavy-tailed workloads and a range of carbon intensity-driven capacity variation. Our studies show that goodput loss increases with heaviness of tail and with increased capacity variation. We identify job terminations as the crucial phenomena and focus on it as the critical performance metric and scheduling objective.
We derive the machine interval distribution from imposed capacity variation and exploit it to reduce terminations under variable capacity. We explore four heuristics with varying levels of machine interval information, including stable machine indices, time between capacity changes, statistical and probabilistic information of interval time remaining. The best performing heuristic uses uptime of an interval to estimate the conditional probability of remaining time. This information is used to improve the assignment of jobs to intervals, reducing terminations by 32%, with 0.9% goodput loss.
We later combine these heuristics to build the Interval-Aware Scheduler (IAS) that reduces terminations across a range of heavy-tailed workloads. On a synthetic workload, IAS lowers terminations by 98% over First-Fit, with a 1.25% goodput loss. IAS can achieve 5% greater goodput, with 32% lower terminations. In the extremely heavy-tailed Azure workload, IAS lowers terminations by 92%, with a 0.05% goodput loss.
Advisor: Andrew Chien
Committee members: Andrew Chien, Sanjay Krishnan, Junchen Jiang, Yves Robert
More information about the cs
mailing list