[Colloquium] Suhail Rehman Candidacy Exam/Feb 16, 2022
Megan Woodward
meganwoodward at uchicago.edu
Mon Feb 7 08:35:50 CST 2022
This is an announcement of Suhail Rehman's Candidacy Exam.
===============================================
Candidate: Suhail Rehman
Date: Wednesday, February 16, 2022
Time: 12:30 pm CST
Remote Location: https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09 Meeting ID: 959 8609 2331 Passcode: 517138
Title: Reconstructing the Lineage of Artifacts in Data Lakes
Abstract: As organizations' data storage, archival, and retrieval needs have evolved beyond simple
relational databases to complex collaborative data lake architectures, a data stewardship
and organization crisis has emerged. It is difficult to reason about existing data artifacts in
data lakes without reliable lineage information and metadata. There is substantial interest in
automated tools and techniques inferring data artifacts' lineage within data lakes.
In this thesis, we the study of lineage in data lakes and specifically the possibility, accuracy,
and efficiency of retrospectively constructing the lineage of data artifacts within a data lake.
Specifically, we are proposing a method that retrospectively uses sketch and estimation
techniques, as well as similarity functions applied at various levels of granularity to (a) organize
data artifacts into the individual workflows that generated them, and (b) infer a lineage tree for
said artifacts that most closely resemble their ground truth derivations.
Advisors: Aaron Elmore
Committee Members: Aaron Elmore, Michael Franklin, and Raul Castro Fernandez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20220207/64d654e1/attachment.html>
More information about the Colloquium
mailing list