[CS] Reminder: [defense] Tang/Dissertation Defense/Nov 9, 2020

Tricia Baclawski pbaclawski at uchicago.edu
Fri Nov 6 08:31:59 CST 2020


https://uchicago.zoom.us/j/95934532114?pwd=OGNzY2Y4UTk5Y0YrOC9OdEY2UDAxQT09
Password: 586992

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***


Candidate:  Dixin Tang

Date:  Monday, November 9, 2020

Time:  1:30 PM

Place:  via zoom

Title: Thrifty Query Processing

Abstract:
Database systems have long been designed to take one of the two major
approaches to process a dataset under changes (e.g. a data stream).
Eager query processing methods, such as continuous query processing,
stream computing, or immediate incremental view maintenance (IVM), are
optimized to reduce query latency. They eagerly maintain standing
queries by consuming all available resources to immediately process
new data, which can be a major source of wasting CPU cycles and memory
resources. On the other hand, lazy query processing methods, such as
batch processing or deferred IVM, defer the query execution to a
future point to reduce resource consumption but suffer high query
latencies. We find that existing eager and lazy query execution
approaches are optimized for the applications on the two ends of the
resource-latency trade-off, but the middle ground between the two is
rarely exploited.

This dissertation proposes a new query processing paradigm, Thrifty
Query Processing (TQP), for the middle-ground applications where users
do not need to see the up-to-date query result right after the data is
ready and allow a slackness of time before the result is returned. TQP
exploits this time slackness to reduce resource consumption and allows
users to tune this slackness to adjust query latencies and resource
consumption.

Implementing TQP involves the redesigns of several core database
components. First, we have a new user model that allows users to not
just submit a SQL query, but also specify the time slackness
information. Specifically, users can specify a performance goal that
represents the maximally allowed time to return the result after the
data is complete. After, we design a new query execution engine to
leverage this performance goal information to reduce CPU cycles. This
execution engine includes optimizations for both a single query and
multiple queries. For a single query, we consider selectively delaying
parts of a query to reduce the resource consumption while meeting the
performance goals. Specifically, we choose to lazily execute the parts
of a query that can best reduce the resource consumption but not
significantly increase the query latency. For multiple queries, we
find that sharing their execution in a single plan may not necessarily
decrease the overall resource consumption. This is because sharing
queries with different performance goals requires the whole plan to
meet the highest performance goal (i.e. the lowest query latency) and
pushes the whole plan to execute overly eagerly for queries with lower
performance goals. This overhead of eager query execution can offset
the benefit of reducing redundant work across multiple queries.
Therefore, we consider selectively sharing queries to avoid the
overhead of eager query execution but also exploit the benefit of
eliminating redundant query work. Finally, we design a memory
management component to release occupied memory resources when the
query is not active. We find that in many cases the data arrival rate
is low (e.g. late data). Depending on the time slackness allowed by
users, the query may have a long idle time. In this scenario, we
selectively release some memory resources (e.g. intermediate states
for a standing query) that are least useful for processing the new
data to reduce memory consumption. We implement TQP in CrocodileDB, a
resource-efficient database, and perform extensive experiments to
evaluate each component of CrocodileDB. We show that CrocodileDB can
significantly reduce CPU and memory consumption while providing
similar query latencies compared to existing approaches.

Dixin's advisor is Prof. Aaron Elmore

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://newtraell.cs.uchicago.edu/phd/phd_announcements#totemtang

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Tricia Baclawski
Student Affairs Administrator
Computer Science Department
5730 S. Ellis - Room 350
Chicago, IL 60637
pbaclawski at uchicago.edu
(773) 702-6854
/pronouns: she, her, hers/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the cs mailing list