[CS] TODAY: Yuanjian Liu Dissertation Defense/May 2, 2025

via cs cs at mailman.cs.uchicago.edu
Fri May 2 10:13:29 CDT 2025


This is an announcement of Yuanjian Liu's Dissertation Defense.
===============================================
Candidate: Yuanjian Liu

Date: Friday, May 02, 2025

Time:  1 pm CST

Remote Location: https://www.google.com/url?q=https://uchicago.zoom.us/j/6019256463?pwd%3DSENUMzJEZEJDMVhMaHZiVDI2V09qdz09&sa=D&source=calendar&ust=1744739206243691&usg=AOvVaw17BRva39kkBgfQcPTsKSlU

Location: JCL 346

Title: Hybrid Lossy Compression Methods Can Confidently Optimize Wide Network Transfer of Complex Datasets

Abstract: Large volumes of data generated by scientific simulations, genome sequencing, and other applications need to be moved among clusters for data collection/analysis. Data compression techniques have effectively reduced data storage and transfer costs. However, users' requirements on interactively controlling both data quality and compression ratios are non-trivial to fulfill. Lossy compression methods need to respect several data constraints to be useful in a realistic data transfer scenario. In this thesis, I propose a novel Compression-as-a-Service (CaaS) platform called GlobaZip with five important contributions: (1) a multi-interval/multi-region based compression algorithm that supports several data constraints to further limit the distortion in data fidelity even though the compression is lossy; (2) a layer-by-layer compression technique that allows much higher parallel compression rate in HPC systems and can coordinate CPU cores on multiple compute nodes to compress extremely large files without out-of-memory errors; (3) a decision tree-based compression performance prediction model that allows users to use very limited computation overhead to estimate compression characteristics including compression ratio, time and data fidelity; (4) an optimized reference-based genome sequence compression algorithm that exceeds the performance of state-of-the-art algorithms by using more fine-grained sequence alignment procedure, reordering reads, a novel dominant bitmap method for quality score compression, and a few other small optimizations; (5) a Qt5-based user-facing app that utilizes Globus Compute and Globus Transfer to provide users with a universal interface to orchestrate remote data compression and transfer. Experiments on multiple real-world datasets on geographically distributed computers show that GlobaZip can significantly improve data transfer efficiency with a performance gain of more than 10x in computing clusters with relatively slow networks. 

Advisors: Ian Foster and Kyle Chard

Committee: Ian Foster, Kyle Chard, Sheng Di




More information about the cs mailing list