[Colloquium] Suhail Rehman Candidacy Exam/Feb 16, 2022

Megan Woodward meganwoodward at uchicago.edu
Mon Feb 7 08:35:50 CST 2022


This is an announcement of Suhail Rehman's Candidacy Exam.
===============================================
Candidate: Suhail Rehman

Date: Wednesday, February 16, 2022

Time: 12:30 pm CST

Remote Location:  https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09 Meeting ID: 959 8609 2331 Passcode: 517138

Title: Reconstructing the Lineage of Artifacts in Data Lakes

Abstract: As organizations' data storage, archival, and retrieval needs have evolved beyond simple
relational databases to complex collaborative data lake architectures, a data stewardship
and organization crisis has emerged. It is difficult to reason about existing data artifacts in
data lakes without reliable lineage information and metadata. There is substantial interest in
automated tools and techniques inferring data artifacts' lineage within data lakes.

In this thesis, we the study of lineage in data lakes and specifically the possibility, accuracy,
and efficiency of retrospectively constructing the lineage of data artifacts within a data lake.
Specifically, we are proposing a method that retrospectively uses sketch and estimation
techniques, as well as similarity functions applied at various levels of granularity to (a) organize
data artifacts into the individual workflows that generated them, and (b) infer a lineage tree for
said artifacts that most closely resemble their ground truth derivations.

Advisors: Aaron Elmore

Committee Members: Aaron Elmore, Michael Franklin, and Raul Castro Fernandez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20220207/64d654e1/attachment.html>


More information about the Colloquium mailing list