[CS] Reminder: [defense] Hao/Dissertation Defense/Nov 6, 2020

Thu Nov 5 12:55:33 CST 2020

https://uchicago.zoom.us/j/5059025192?pwd=bkNYZnRRcEZUdnBUUkYrRE4ya2cyQT09
Password: 667590

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***

Candidate:  Mingzhe Hao

Date:  Friday, November 6, 2020

Time:  1:00 PM

Place:  via zoom

Title: Fast and Stable Data and Storage Systems in Milli/Micro-second
Era

Abstract:
As numerous new data-intensive applications and storage hardware
emerge, maintaining performance sustainability and robustness of data
and storage systems is becoming more intricate and challenging. Users
want numerous demands (e.g., real-time latency, continuously high
throughput, workload elasticity) to be met. Service providers are
facing a hard task of delivering acceptable service-level objectives
(SLOs) such as low and highly stable latencies. Both parties
essentially wish for the same goal, but the gap in between continue to
widen tragically and become more complex. Customers keep introducing
more data paradigms (e.g., big data, machine learning, IoT) and
bombing providers with application-specific requirements that are more
than non-trivial to fulfill, which brings a growing threat to
designing generic systems that can persistently deliver rapid
performance.

This dissertation aims at building fast and robust next-generation
data and storage systems. Specifically, we architect these systems
generically to achieve rapid responses of low latency even in the most
turmoil scenarios. As systems grow in complexity, this dissertation
tackles this significant problem from three different angles:

1. Data approach: We should have a thorough and scaled understanding
of real-world issues with increasing complicacy to help us pinpoint
the potential crux and solutions. Here, we present TAILATSTORE, which
mines performance logs tracking half a million disks and thousands of
SSDs, and to the best of our knowledge is the most extensive study of
storage device-level performance variability. TAILATSTORE reveals that
storage performance instability is not uncommon, and the primary cause
of slowdowns are the internal characteristics and idiosyncrasies of
modern disk and SSD drives, motivating the design of tail-tolerant
mechanisms.

2. Hardware-level approach: While other approaches attempt to reduce
performance variability at the application level with approaches like
speculation, we see a different point of view, whereas cutting
performance variability “at the source” is more effective.
Specifically, in TINYTAILFLASH, we re-architect SSDs that collaborate
with the host and circumvent almost all noises induced by background
operations. Furthermore, to further highlight the importance of
hardware-level approach and facilitate its development, we present
FAILSLOWATSCALE – a study on hardware with performance degradation,
and FEMU – a software flash emulator for fostering future SSD
research.

3. OS-level approach: At the heart of the system stack is the OS;
hence, the question is how the OS should evolve today to provide
stable performance for the deep stack. In tackling this problem, our
insight is that the OS is not just the OS for personal computers, but
rather the OS for the “datacenter”. In this context, we present MITTOS
– an OS that is SLO-aware and capable of predicting every I/O latency
and failing over slow I/Os to peer OSs. MITTOS’s no-wait approach
helps reduce I/O completion time up to 35% compared to
wait-then-speculate approaches.

Additionally, as another effort on “OS for datacenter”, we present
LeapIO, which promotes address transparency across components in the
cloud storage stack to smooth the offload of complex storage services
to today’s I/O accelerators. LeapIO employs a set of OS/software
techniques on top of hardware capabilities to provide a uniform
address space across x86 cores and I/O accelerators, allowing the host
to portably leverage the accelerators.

4. ML-for-system approach: Current systems are growing too complex for
human designers to come up with a heuristic-based policy for optimal
system control. So many different storage models exist, which are very
heterogeneous with performance unpredictability. Applications cannot
reason about how they work, and predicting systems’ performance is a
black art. This situation raises the question of whether machine
learning can help. To answer this, we present LINNOS, which uses
neural networks to predict the performance of every request and every
I/O, making unforeseeable systems performance highly predictable.
LINNOS supports black-box devices and real production traces without
requiring any extra input from users, while outperforming industrial
mechanisms and other approaches. Compared to hedging and
heuristic-based methods, LINNOS improves the average I/O latencies by
9.6-79.6% with 87-97% inference accuracy and 4-6μs inference
overhead for each I/O, demonstrating that it is possible to
incorporate machine learning inside operating systems for real-time
decision-making.

Lastly, this dissertation raises discussions on future research to
build fast and stable data and storage systems and help storage
applications achieve performance predictability in milli/micro-second
era.

Mingzhe's advisor is Prof. Haryadi Gunawi

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://newtraell.cs.uchicago.edu/phd/phd_announcements#hmz20000

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Tricia Baclawski
Student Affairs Administrator
Computer Science Department
5730 S. Ellis - Room 350
Chicago, IL 60637
pbaclawski at uchicago.edu
(773) 702-6854
/pronouns: she, her, hers/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=