<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body>
<div class="" style="word-wrap:break-word">
<div class="" style="word-wrap:break-word; line-break:after-white-space">
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
This is an announcement of Suhail Rehman's Candidacy Exam.</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
===============================================</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Candidate: Suhail Rehman</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Date: Wednesday, February 16, 2022</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Time: 12:30 pm CST</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Remote Location: <span class="x_x_apple-converted-space"> </span><a href="https://urldefense.com/v3/__https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09__;!!BpyFHLRN4TMTrA!vev8HC_r72mM6IO5aY6ff3kIAjoluwaRLFHytCEG4-H9g_I041ntiKUv9nOlx7urVa_Z_Yyd$" title="https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09" class="" style="color:rgb(5,99,193)">https://uchicago.zoom.us/j/95986092331?pwd=bjFrNzBwUzNBaTJLUXh5dWJkQnduQT09</a><span class="x_x_apple-converted-space"> </span>Meeting
 ID: 959 8609 2331 Passcode: 517138</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Title: Reconstructing the Lineage of Artifacts in Data Lakes</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Abstract: As organizations' data storage, archival, and retrieval needs have evolved beyond simple<span class="x_x_apple-converted-space"> </span></div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
relational databases to complex collaborative data lake architectures, a data stewardship<span class="x_x_apple-converted-space"> </span></div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
and organization crisis has emerged. It is difficult to reason about existing data artifacts in<span class="x_x_apple-converted-space"> </span></div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
data lakes without reliable lineage information and metadata. There is substantial interest in</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
automated tools and techniques inferring data artifacts' lineage within data lakes.</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
In this thesis, we the study of lineage in data lakes and specifically the possibility, accuracy,</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
and efficiency of retrospectively constructing the lineage of data artifacts within a data lake.</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Specifically, we are proposing a method that retrospectively uses sketch and estimation</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
techniques, as well as similarity functions applied at various levels of granularity to (a) organize</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
data artifacts into the individual workflows that generated them, and (b) infer a lineage tree for</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
said artifacts that most closely resemble their ground truth derivations.</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
</div>
</div>
</div>
<div class="" style="word-wrap:break-word">
<div class="" style="word-wrap:break-word; line-break:after-white-space">
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
<br class="">
</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
</div>
</div>
<div class="" style="word-wrap:break-word; line-break:after-white-space">
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Advisors: Aaron Elmore</div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
 </div>
<div class="" style="margin:0in; font-size:11pt; font-family:Calibri,sans-serif">
Committee Members: Aaron Elmore, Michael Franklin, and Raul Castro Fernandez</div>
</div>
</div>
</body>
</html>