[Colloquium] FW: [CS] Suhail Rehman MS Presentation/May 19, 2021

Rene Noyola rnoyola at uchicago.edu
Fri May 14 10:28:45 CDT 2021

Reminder - Tomorrow - Suhail Rehman MS Presentation/May 19, 2021

    This is an announcement of Suhail Rehman's MS Presentation.
    Date: Wednesday, May 19, 2021
    Time: 10:00AM CST
    Location: via zoom
    Meeting ID: 955 7685 4504
    Passcode: 152945
    M.S. Candidate: Suhail Rehman
    M.S. Paper Title: Sifting Through Data Relics: An Automated Framework for Retrospective Analysis of Data Artifacts
    Abstract: Over the data science lifecycle, data scientists work with datasets using a variety of tools, including spreadsheets, computational notebooks, and ad-hoc scripts,for a range of tasks, including cleaning, integration, feature engineering, and visualization. This ad-hoc, heterogeneous process of data science typically results in multiple versions of the dataset(s) recorded as artifacts, operated on by various tools,even within a single data science workflow.Lineage information, including source datasets, data transformation programs or scripts, or manual annotations, is rarely captured, making it difficult to infer the relationships between artifacts in a given workflow retrospectively.We introduce the problem of retrospective lineage inference, wherein, given a collection of tabular artifacts, the goal is to reconstruct a lineage graph that resembles their true evolution in the data analysis workflows that generated them,aiding reproducibility, explain ability, and long-term maintenance. Our technique for retrospective lineage inference, RELIC, differentiates between operations that keep row and column correspondences intact, and those that do not; we use fine-grained similarity metrics to infer relationships for the former and targeted set containment-based detectors for the latter. RELIC can reconstruct lineage graphs from artifacts in a representative sample of real-world Jupyter notebooks with an average F1 score of ~0.91 without access to code, documentation, or other metadata.
    Advisor: Aaron Elmore
    Committee Members: Aaron Elmore, Raul Castro Fernandez, and Michael Franklin
    One-Click Unsubscribe: https://mailman.cs.uchicago.edu/mailman/options/cs/rnoyola%40cs.uchicago.edu?password=Dx8I0M09&unsub=1&unsubconfirm=1
    When unsubscribing manually please use your cnetid at cs.uchicago.edu address to unsubscribe if your cnetid at uchicago.edu does not work.
    cs mailing list  -  cs at mailman.cs.uchicago.edu
    Edit Options and/or Unsubscribe: https://mailman.cs.uchicago.edu/mailman/listinfo/cs
    More information here: https://howto.cs.uchicago.edu/techstaff:mailinglist

More information about the Colloquium mailing list