[CS] Reminder - Suhail Rehman Candidacy Exam/Feb 16, 2022

Megan Woodward meganwoodward at uchicago.edu
Wed Feb 16 08:30:17 CST 2022

This is an announcement of Suhail Rehman's Candidacy Exam.
Candidate: Suhail Rehman

Date: Wednesday, February 16, 2022

Time: 12:30 pm CST

Remote Location:  https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09<https://urldefense.com/v3/__https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09__;!!BpyFHLRN4TMTrA!vev8HC_r72mM6IO5aY6ff3kIAjoluwaRLFHytCEG4-H9g_I041ntiKUv9nOlx7urVa_Z_Yyd$> Meeting ID: 959 8609 2331 Passcode: 517138

Title: Reconstructing the Lineage of Artifacts in Data Lakes

Abstract: As organizations' data storage, archival, and retrieval needs have evolved beyond simple
relational databases to complex collaborative data lake architectures, a data stewardship
and organization crisis has emerged. It is difficult to reason about existing data artifacts in
data lakes without reliable lineage information and metadata. There is substantial interest in
automated tools and techniques inferring data artifacts' lineage within data lakes.

In this thesis, we the study of lineage in data lakes and specifically the possibility, accuracy,
and efficiency of retrospectively constructing the lineage of data artifacts within a data lake.
Specifically, we are proposing a method that retrospectively uses sketch and estimation
techniques, as well as similarity functions applied at various levels of granularity to (a) organize
data artifacts into the individual workflows that generated them, and (b) infer a lineage tree for
said artifacts that most closely resemble their ground truth derivations.

Advisors: Aaron Elmore

Committee Members: Aaron Elmore, Michael Franklin, and Raul Castro Fernandez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/cs/attachments/20220216/a792e067/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Suhail_Ph_D__Proposal-2.pdf
Type: application/pdf
Size: 1788963 bytes
Desc: Suhail_Ph_D__Proposal-2.pdf
URL: <http://mailman.cs.uchicago.edu/pipermail/cs/attachments/20220216/a792e067/attachment-0001.pdf>

More information about the cs mailing list