[Colloquium] FW: [CS] Suhail Rehman MS Presentation/May 19, 2021
Rene Noyola
rnoyola at uchicago.edu
Fri May 14 10:28:45 CDT 2021
Reminder - Tomorrow - Suhail Rehman MS Presentation/May 19, 2021
This is an announcement of Suhail Rehman's MS Presentation.
===============================================
Date: Wednesday, May 19, 2021
Time: 10:00AM CST
Location: via zoom
https://uchicago.zoom.us/j/95576854504?pwd=eEhDeGcvV28vdDNodGhWNlZoR1JIZz09
Meeting ID: 955 7685 4504
Passcode: 152945
M.S. Candidate: Suhail Rehman
M.S. Paper Title: Sifting Through Data Relics: An Automated Framework for Retrospective Analysis of Data Artifacts
Abstract: Over the data science lifecycle, data scientists work with datasets using a variety of tools, including spreadsheets, computational notebooks, and ad-hoc scripts,for a range of tasks, including cleaning, integration, feature engineering, and visualization. This ad-hoc, heterogeneous process of data science typically results in multiple versions of the dataset(s) recorded as artifacts, operated on by various tools,even within a single data science workflow.Lineage information, including source datasets, data transformation programs or scripts, or manual annotations, is rarely captured, making it difficult to infer the relationships between artifacts in a given workflow retrospectively.We introduce the problem of retrospective lineage inference, wherein, given a collection of tabular artifacts, the goal is to reconstruct a lineage graph that resembles their true evolution in the data analysis workflows that generated them,aiding reproducibility, explain ability, and long-term maintenance. Our technique for retrospective lineage inference, RELIC, differentiates between operations that keep row and column correspondences intact, and those that do not; we use fine-grained similarity metrics to infer relationships for the former and targeted set containment-based detectors for the latter. RELIC can reconstruct lineage graphs from artifacts in a representative sample of real-world Jupyter notebooks with an average F1 score of ~0.91 without access to code, documentation, or other metadata.
Advisor: Aaron Elmore
Committee Members: Aaron Elmore, Raul Castro Fernandez, and Michael Franklin
_______________________________________________
One-Click Unsubscribe: https://mailman.cs.uchicago.edu/mailman/options/cs/rnoyola%40cs.uchicago.edu?password=Dx8I0M09&unsub=1&unsubconfirm=1
When unsubscribing manually please use your cnetid at cs.uchicago.edu address to unsubscribe if your cnetid at uchicago.edu does not work.
cs mailing list - cs at mailman.cs.uchicago.edu
Edit Options and/or Unsubscribe: https://mailman.cs.uchicago.edu/mailman/listinfo/cs
More information here: https://howto.cs.uchicago.edu/techstaff:mailinglist
More information about the Colloquium
mailing list