[CS] Yuanjian Liu Dissertation Defense/May 2, 2025
via cs
cs at mailman.cs.uchicago.edu
Tue Apr 22 09:37:09 CDT 2025
This is an announcement of Yuanjian Liu's Dissertation Defense.
===============================================
Candidate: Yuanjian Liu
Date: Friday, May 02, 2025
Time: 1 pm CST
Remote Location: https://uchicago.zoom.us/j/6019256463?pwd=SENUMzJEZEJDMVhMaHZiVDI2V09qdz09
Location: JCL 346
Title: Hybrid Lossy Compression Methods Can Confidently Optimize Wide Network Transfer of Complex Datasets
Abstract: Large volumes of data generated by scientific simulations, genome sequencing, and other applications need to be moved among clusters for data collection/analysis. Data compression techniques have effectively reduced data storage and transfer costs. However, users' requirements on interactively controlling both data quality and compression ratios are non-trivial to fulfill. Lossy compression methods need to respect several data constraints to be useful in a realistic data transfer scenario. In this thesis, I propose a novel Compression-as-a-Service (CaaS) platform called GlobaZip with five important contributions: (1) a multi-interval/multi-region based compression algorithm that supports several data constraints to further limit the distortion in data fidelity even though the compression is lossy; (2) a layer-by-layer compression technique that allows much higher parallel compression rate in HPC systems and can coordinate CPU cores on multiple compute nodes to compress extremely large files without out-of-memory errors; (3) a decision tree-based compression performance prediction model that allows users to use very limited computation overhead to estimate compression characteristics including compression ratio, time and data fidelity; (4) an optimized reference-based genome sequence compression algorithm that exceeds the performance of state-of-the-art algorithms by using more fine-grained sequence alignment procedure, reordering reads, a novel dominant bitmap method for quality score compression, and a few other small optimizations; (5) a Qt5-based user-facing app that utilizes Globus Compute and Globus Transfer to provide users with a universal interface to orchestrate remote data compression and transfer. Experiments on multiple real-world datasets on geographically distributed computers show that GlobaZip can significantly improve data transfer efficiency with a performance gain of more than 10x in computing clusters with relatively slow networks.
Advisors: Ian Foster
Committee: Ian Foster, Kyle Chard, Sheng Di
More information about the cs
mailing list