[Colloquium] Reminder: Dziedzic/MS Presentation/Nov 6, 2017

Margaret Jaffey via Colloquium colloquium at mailman.cs.uchicago.edu
Fri Nov 3 09:35:32 CDT 2017


This is a reminder about Adam Dziedzic's MS Presentation on Monday.

------------------------------------------------------------------------------
Date:  Monday, November 6, 2017

Time:  1:30 PM

Place:  Ryerson 276

M.S. Candidate:  Adam Dziedzic

M.S. Paper Title: Data Loading, Transformation and Migration for
Database Management Systems

Abstract:
The exponential rate in which data is produced and gathered nowadays
has turned data loading and migration into major bottlenecks of the
data analysis pipeline. Data loading is an initial step of data
analysis where raw data, such CSV files, are transformed to an
internal representation of a Database Management System (DBMS).
However, given that the analysis is complex and diverse, many users
utilize more than a single system since specialized engines provide a
superior performance for a given type of data analysis. Thus, the data
have to be transformed and migrated between many DBMSs. To ease the
burden of dealing with a diverse set of disparate DBMSs, polystores
seamlessly and transparently integrate specialized engines to expose a
single unified interface to users and applications. A polystore system
has to migrate data when a workload change results in poor performance
due to inadequate mapping of data objects to engines. Moreover,
partial results of query executions need to be migrated, for example,
a final output of a MapReduce job is transferred to a relational
database and joined with data selected from a table. To alleviate the
aforementioned problems, first, we identify the overheads of data
loading by carrying out an empirical analysis in three dimensions:
software, hardware, and application workloads. Our experimental
results show that modern DBMSs are unable to saturate the available
hardware resources of multi-core platforms. Second, we leverage our
findings from the empirical analysis of data loading and demonstrate
how to harness task and data level parallelism, concise binary
formats, adaptive processing, compression, and modern hardware to
achieve a performant data transformation and migration. We present the
results of our accelerators for four DBMSs. Our data migration
solution is part of BigDAWG, a reference implementation of a polystore
system, and can be used as a general framework where fast data
loading, transformation, and migration are required. We show how the
data loading process should use all the resources efficiently to
achieve 10 times speedup and that we can migrate data 4 times faster
by applying concise binary formats.

Adam's advisor is Prof. Aaron Elmore

Login to the Computer Science Department website for details:
 https://www.cs.uchicago.edu/phd/ms_announcements#ady

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


More information about the Colloquium mailing list