[Colloquium] Tyler Skluzacek Candidacy Exam/Aug 23, 2021

nitayack at uchicago.edu nitayack at uchicago.edu
Mon Aug 16 04:33:58 CDT 2021


This is an announcement of Tyler Skluzacek's Candidacy Exam.
===============================================
Candidate: Tyler Skluzacek

Date: Monday, August 23, 2021

Time:  1 pm CST

Via Zoom: https://uchicago.zoom.us/j/95620498814?pwd=bVkvNnBYcTBtc3NpbjB6eWQ1RWd6Zz09

Title:  Can automated metadata extraction make distributed data swamps more navigable?

Abstract: Scientific data repositories are generally chaotic—files spanning heterogeneous domains, studies, and users are stuffed into an increasingly-unsearchable data swamp without regard for organization, discoverability, or usability.  Files that could contribute to a scientist’s future research may be spread across multiple storage facilities and submerged beneath petabytes of other files, rendering manual annotation and navigation virtually impossible.  To remedy this lack of navigability, scientists require a rich search index of metadata, or data about data, extracted from individual files. In this thesis, I will explore automated solutions for converting dark data swamps into navigable data collections, given no prior knowledge regarding each file’s schema or provenance. I first explore ways to extract metadata from files of vastly different structures by building a robust suite of metadata extraction functions capable of processing an array of file types.  To increase extraction efficiency, I explore automated file type identification methods to apply only applicable extraction functions to files. 
Advisors: Ian Foster

Committee Members: Ian Foster, Kyle Chard, Michael Franklin, and Raul Castro Fernandez



More information about the Colloquium mailing list