[Colloquium] Rui Liu Dissertation Defense/Oct 16, 2023

Fri Oct 13 13:31:38 CDT 2023

This is an announcement of Rui Liu's Dissertation Defense.
===============================================
Candidate: Rui Liu

Date: Monday, October 16, 2023

Time:  3:30 pm CST

Remote Location: https://uchicago.zoom.us/j/9593449761?pwd=ZGhHNHZ5dkthZ3lhM3pYSGw4R3JDdz09

Location: JCL 390

Title: Resource-Aware Optimizations for Data-Intensive Systems

Abstract: In modern computing environments, data-intensive systems have assumed a significant role, playing a critical part in a broad spectrum of applications. However, modern computing environments, characterized as resource-dynamic with new developments, see ephemerality in resource availability and fluctuations in resource monetary cost. Such environments, which are incompatible with current data-intensive system designs and workloads, require a rethink of design principles and necessitate the new primitives: resource expediency, resource arbitration, and resource suspension and resumption. In this dissertation, we exploit the research opportunities presented by the three primitives and develop various prototype data-intensive systems tailored to address the challenges.

Firstly, to explore resource expediency, we propose and implement Repack for deep learning training to share common I/O and computing processes among models on the same computing device. We further present a comprehensive empirical study of repack and end-to-end experiments that suggest significant improvements for hyperparameter tuning. The results suggest: (1) repacking two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of the repack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a repack-aware Hyperband is up to 2.7x faster than the original Hyperband, with this improvement growing as memory size increases and subsequently the density of models packed.

Secondly, we propose and design a resource arbitration framework, Rotary, to continuously prioritize the progressive iterative analytics and determine if/when to reallocate and preempt the resources for them. In comparison to classic computing applications, progressive iterative analytics (PIA) keep providing approximate or partial results to users by performing computations on a subset of the entire dataset until either the users are satisfied with the results, or the predefined completion criteria are achieved. Typically, PIA jobs have various completion criteria, produce diminishing returns, and process data at different rates. Within Rotary, we consider two prevalent cases, approximate query processing (AQP) and deep learning training (DLT), and implement two resource arbitration systems, Rotary-AQP and Rotary-DLT. We build a TPC-H based AQP workload and a survey-based DLT workload to evaluate the two systems. The results demonstrate that Rotary-AQP and Rotary-DLT outperform the state-of-the-art systems and confirm the generality and practicality of the proposed resource arbitration framework.

Finally, we present an adaptive query execution framework, Riveter, for cloud-native databases. It can suspend queries when the resources are limited, or the costs are unexpectedly high, then resume them when the resources become available and cost-effective. Within Riveter, we implement various strategies, including (1) a redo strategy that terminates queries and subsequently re-runs them, (2) a pipeline-level strategy that suspends a query once one of its pipelines has completed, thereby reducing the storage requirements for intermediate states, (3) and a process-level strategy that enables the suspension of query execution processes at any given moment but generate a substantial volume of intermediate states for query resumption. We also devise a cost model designed to determine the optimal strategy with minimum latency for different queries. We conducted a performance study, an end-to-end analysis, and a cost model evaluation using the TPC-H benchmark. Our results present the difference among the suspension and resumption strategies of Riveter in terms of the size of persisted intermediate data and confirm the adaptivity and efficiency of Riveter.

Advisors: Aaron Elmore and Michael Franklin

Committee Members: Aaron Elmore, Michael Franklin, and Sanjay Krishnan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20231013/7033cc6e/attachment.html>