[Colloquium] [defense] Li/Dissertation Defense/Jul 13, 2020

Margaret Jaffey margaret at cs.uchicago.edu
Mon Jun 29 11:19:36 CDT 2020


This is an announcement about Huaicheng Li's defense.

Here is the Zoom link to participate:


https://uchicago.zoom.us/j/92952579383?pwd=dkJoUVUxZW9FTzdycXBrTUhyTy9PUT09

Password: 701903

One tap mobile +13126266799,,92952579383# US (Chicago)
+16465588656,,92952579383# US (New York)

Dial by your location +1 312 626 6799 US (Chicago) +1 646 558 8656 US
(New York) +1 301 715 8592 US (Germantown) +1 346 248 7799 US
(Houston) +1 669 900 9128 US (San Jose) +1 253 215 8782 US (Tacoma)
888 788 0099 US Toll-free 877 853 5247 US Toll-free Meeting ID: 929
5257 9383 Password: 701903



       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***


Candidate:  Huaicheng Li

Date:  Monday, July 13, 2020

Time:  10:00 AM

Place:  remotely via Zoom

Title: Evolving Cloud Storage Stack for Predictability and Efficiency

Abstract:
With the exponential growth of data which are expected to reach 175
zettabytes by 2025, cloud storage is increasingly becoming the central
hub for data management and processing. Among many benefits cloud
platforms promise, predictable performance and cost-efficiency are two
fundamental factors driving the success of modern cloud storage.
However, under rapid changes of modern cloud storage infrastructure in
terms of both software and hardware, new challenges emerge for
achieving predictable performance with efficiency.

In more detail, modern data intensive applications and new wave of
computing paradigms (e.g., data analytics, ML, serverless) drive the
storage stack to undergo a radical shift towards more feature-rich
software designs on top of increasingly heterogeneous architectures.
As a result, today's cloud storage stack is extremely heavy-weight and
complex, burning 10-20% of data center CPU cycles and introducing
severe performance non-determinism (i.e., long tail latencies).
Unfortunately, the deployment of new acceleration hardware (e.g., NVMe
SSDs and I/O co-processors) only {partially} addresses the problem.
Due to the intrinsic complexities and idiosyncrasies in hardware
(e.g., NAND Flash management) and lack of system-level support, it
remains a challenge to design performant and cost-efficient cloud
storage systems. In particular, achieving sub-millisecond level
latency predictability in a cost-efficient manner is the new
battlefield.

Rooted in deep understanding and analysis of existing
software/hardware stack, this dissertation focuses on building new
abstractions, interfaces and end-to-end storage systems to achieve
predictable performance and cost-efficiency using a software/hardware
co-design approach. By revisiting the challenges across different
layers in a holistic manner, the co-design approach opens up simple
yet powerful system-level policy designs to opportunistically exploit
hardware idiosyncrasies and heterogeneity. The systems we build can
effectively decrease latency spikes by up to orders of magnitude and
increase cost savings by 20x.

To address the challenge of predictable performance in modern Flash
storage systems, we present TeaFA, a tail-evading flash array design
delivering deterministic performance. TeaFA uniquely combines a simple
yet powerful host-SSD interface, time window mechanism, and data
redundancy to proactively and deterministicaly reconstruct late
requests, with only minor changes to the host software and device
firmware. The evaluation results across 9 data center storage traces
and several real storage workloads (e.g., FileBench, YCSB/RocksDB)
show that TeaFA improves the baseline by orders of magnitude and is
only 1.1x to 2.1x slower than an ideal case where there is no
background operations induced tail latencies. When compared to other
state-of-the-art works (e.g., Proactive approach, Preemptive GC, P/E
Suspension, Flash-on-Rails and Harmonia) focusing on improving I/O
performance, TeaFA is more deterministic and effective in cutting tail
latencies while being less intrusive and easy to deploy.

Although TeaFA effectively improves tail latencies, a significant
portion of CPU cycles is needed to fulfill the reconstruction
computations. Worse, at a large scale, the ``storage tax'' that cloud
providers have to pay takes up to 10-20% of datacenter CPU cycles.
Thus, it's challenging to achieve cost/resource-efficiency in modern
cloud storage stack designs. One opportunity is to utilize modern I/O
accelerators for cost-efficient storage offloading. Yet, the complex
cloud storage stack is not completely offload-ready to today's IO
accelerators. To tackle the cost-efficiency challenge, we present
LeapIO, a next-generation cloud storage stack that leverages ARM-based
co-processors to offload complex storage services. LeapIO addresses
many deployment challenges, such as hardware fungibility, software
portability, virtualizability, composability, and efficiency. It
employs a set of OS/software techniques and new hardware properties to
provide a uniform address space across the x86 and ARM cores to
minimize data copies and directly expose virtual NVMe storage to
unmodified guest VMs. At the core, LeapIO runtime enables agile
storage service development at the user-space. Storage services on
LeapIO are ``offload ready;'' they can portably run in ARM SoC or on
host x86 in a trusted VM. The software overhead only exhibits 2-5%
throughput reduction compared to bare-metal performance (still
delivering the peak bandwidth of 0.65 million IOPS on a datacenter
SSD). Our current SoC prototype also delivers an acceptable
performance, 5% further reduction on the server side (and up to 30% on
the client) but with more than 20x cost savings. Overall, LeapIO helps
cloud providers cut the storage tax and improve utilization without
sacrificing performance.

Finally, we discuss the importance of scalable and extensible research
platforms for fostering future full-stack software/hardware storage
research. Existing software platforms (e.g., SSD/SoC simulators or
emulators) are limited by the types of research they support, outdated
and not scalable. Hardware platforms suffer from wear-out issues and
are difficult to use. Thus, it's not an excellent choice for new idea
exploration in the early phase neither. We argue that it is a critical
time for storage research community to have a new software-based
full-system SSD emulator. To this end, we build FEMU, a software
(QEMU-based) NVMe flash emulator. FEMU is cheap (open-sourced),
relatively accurate (0.5-38% variance as a drop-in replacement of
OpenChannel SSD), scalable (can support 32 parallel channels/chips),
and extensible (support internal-only and split-level SSD research).
FEMU has been used by researchers from tens of institutions and in
classes, demonstrating the urgent need for such a research platform
and its success.



Huaicheng's advisor is Prof. Haryadi Gunawi

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://newtraell.cs.uchicago.edu/phd/phd_announcements#huaicheng

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (JCL 350)              (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the Colloquium mailing list