[CS] REMINDER:Qiming Wang Dissertation Defense/ October 3,2024

Devin Davis via cs cs at mailman.cs.uchicago.edu
Mon Sep 30 10:00:58 CDT 2024


This is an announcement of Qiming Wang Dissertation Defense
===============================================
Candidate: Qiming Wang

Date: Thursday, October 3rd, 2024

Time:  2pm -4pm CT

Location:   JCL 236

Title:  Tabular Data Extraction and Discovery Using Natural Language Questions

Abstract: Tabular data extraction and discovery from a large corpus are two long-standing challenges in the data management community. Traditional solutions involve much human effort in writing rules or annotating training data and this expensive manual work has to be repeated for each new domain of source corpus making these solutions not scalable. So can we avoid this repetitive, expensive manual work but still maintain comparable performance across different datasets? In this dissertation, we show that we can do both, 1)By reducing table extraction from a large text corpora as the task of question answering over that corpora, it is possible to build a table extraction system that generalizes to other domain once trained, so that it avoids repeating manual work in collecting training data for new table domains. 2) Given any table corpora, by learning from the corpora itself with the help of a large language model, we can build a table discovery system that matches the query quality of those trained on human-annotated data. Specifically, we build three systems/tools to demonstrate they actually work. 1) FabricQA-Extractor, a system to extract tables from a text corpora using natural language questions. 2) SOLO: a self-supervised system for table discovery using natural language questions. 3) Pneuma-Benchmark: an automatic tool to evaluate models/systems for table discovery using natural language questions. These works collectively contribute a solution with little human effort to tabular data extraction and discovery.

Advisors:  Raul Castro Fernandez

Committee Members: Raul Castro Fernandez, Chard Kyle, Sanjay Krishnan





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/cs/attachments/20240930/60fecd1c/attachment.html>


More information about the cs mailing list