[CS] Ruidan Li Candidacy Exam/Dec 11, 2025
via cs
cs at mailman.cs.uchicago.edu
Wed Dec 10 14:36:12 CST 2025
This is an announcement of Ruidan Li's Candidacy Exam.
===============================================
Candidate: Ruidan Li
Date: Thursday, December 11, 2025
Time: 9 am CST
Remote Location: https://uchicago.zoom.us/j/6165423241?pwd=ejhpSEl2dTdaRFNBMzAvOEZ6RytjUT09&omn=97789485869&from=addon
Location: JCL 298
Title: A Robust and Comprehensive Pipeline for Detecting Various Disk Fail-Slow Failures at Scale
Abstract: Disk performance instability is a growing problem in large-scale storage systems, where disks often become slow and erratic instead of failing cleanly. These fail-slow behaviors are difficult to capture because they appear under diverse workloads, contain substantial noise, and may be overshadowed by even slower peers. As a result, many existing detectors either miss important cases or require environment-specific tuning that does not scale across fleets.
We present SystemX, a robust and comprehensive detection pipeline for fail-slow disks at scale. Using a very large production dataset that spans both solid state drives and hard disk drives, we first identify five major patterns of fail-slow behavior and two common noise patterns that frequently confuse prior methods. Guided by this study, SystemX builds a compact signal that combines spatial deviation across disks with temporal persistence over time, aggregates it into per disk risk levels using statistical-based scoring, and applies further filtering to expose subtle overshadowed cases.
Across multiple vendor and deployment datasets with RAID and non-RAID configurations, SystemX detects about 33 to 50 percent more fail-slow disks than the state-of-the-art method, maintains accuracy above 95 percent over a wide range of alert thresholds, and runs several times faster, making it suitable for continuous fleet wide monitoring.
Advisor: Haryadi Gunawi
Committee Members: Haryadi Gunawi, Kexin Pei, Erci Xu
More information about the cs
mailing list