[Colloquium] Tyler Skluzacek Dissertation Defense/May 20, 2022
Megan Woodward
meganwoodward at uchicago.edu
Thu May 19 09:39:11 CDT 2022
This is an announcement of Tyler Skluzacek's Dissertation Defense.
===============================================
Candidate: Tyler Skluzacek
Date: Friday, May 20, 2022
Time: 10:30 am CST
Remote Location: https://zoom.us/j/96049739931?pwd=TWszL08rRWpjY2xISlN1ZU9aNEozUT09 Meeting ID: 960 4973 9931 Passcode: 8EL1yQ
Location: JCL 011
Title: AUTOMATED METADATA EXTRACTION CAN MAKE DATA SWAMPS MORE NAVIGABLE
Abstract: In a science utopia, every research repository would be accompanied by a database of rich, searchable metadata that users can quickly and confidently query to discover, retrieve, and organize the many artifacts of research workflows. In practice, science is far from this utopia; repositories commonly decay into disorganized data swamps that overwhelm scientists and result in crucial research data being inaccessible to those who could make use of them. To dredge data swamps, I describe an automated metadata extraction system for science—Xtract—that crawls large repositories, dynamically constructs extraction workflows by intelligently mapping extractors to diverse file types, scalably executes these workflows on distributed research cyberinfrastructure, and publishes the contents into a search index. I show via a user study that an Xtract-generated search index drastically increases the speed and confidence with which researchers navigate their science collections. Finally, I highlight the benefits of this approach by applying Xtract to real-world repositories collectively spanning over 6 million files and 1PB of data across materials science, climate science, battery modeling, and spectroscopy repositories.
Advisors: Kyle Chard and Ian Foster
Committee Members: Ian Foster, Kyle Chard, Michael Franklin, and Raul Castro Fernandez
https://drive.google.com/file/d/1d-lu3t9LCc7okoV_yOQ4j-xUDgFtygg7/view?usp=sharing<https://urldefense.com/v3/__https://drive.google.com/file/d/1d-lu3t9LCc7okoV_yOQ4j-xUDgFtygg7/view?usp=sharing__;!!BpyFHLRN4TMTrA!9Q2Q6oqcrOVRb8vxFPLKrlEsxqc6jJ5o_AAmESZLZBdS7dt0cjEEIKFWiWe05SmmnzLhGafVYyBP498xk7z1TII7mrk76ow$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20220519/fd13045b/attachment.html>
More information about the Colloquium
mailing list