[Colloquium] Pranav Subramaniam MS Presentation/Sep 22, 2023

Megan Woodward meganwoodward at uchicago.edu
Mon Sep 11 09:24:57 CDT 2023


This is an announcement of Pranav Subramaniam's MS Presentation
===============================================
Candidate: Pranav Subramaniam

Date: Friday, September 22, 2023

Time: 10:30 am CST

Location: JCL 011

M.S. Paper Title: LLM-AS-A-CROWD: HOW TO USE LLMS FOR DATA INTEGRATION TASKS

Abstract: Large Language Models (LLMs) are capable of answering questions without task-specific
training data, which creates an opportunity to address data integration tasks such as entity
resolution, joinability, and unionability. Solving these tasks requires incorporating semantic
knowledge, which has been a hard-to-address challenge. LLMs exhibit a tremendous capacity
to understand nuance in language and thus are promising for this task. However, using them
requires addressing two challenges: i) how to query them to obtain valid answers, i.e., prompt
engineering; ii) how to best incorporate them along today’s software pipelines.
In this paper, we study the potential of LLMs for data integration tasks. We find that
thinking of LLMs-as-a-crowd is a useful mental model to leverage them for data integration
tasks because it results in high quality results without any access to training data—which
many state of the art methods rely on—and without any dependence on good prompt engineering
skills. We integrate LLMs into software pipelines that leverage the vast research in
the areas of entity resolution, joinability, and unionability. We find that LLMs are effective
as an aid—but not a replacement–to software integration pipelines, thus effectively building
on previous efforts.
We obtain state of the art results for the three tasks we study, entity resolution, joinability,
and unionability. And we demonstrate that thinking of LLMs-as-a-crowd is effective and
complementary to other methods, such as few-shot learning. All in all, our experimental
evaluation paves the way for further study of the use of LLMs for data integration tasks.

Advisors: Raul Castro Fernandez

Committee: Raul Castro Fernandez, Aaron Elmore, and Sanjay Krishnan








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20230911/f6e3e265/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Pranav_Subramaniam_MS_paperv3.pdf
Type: application/pdf
Size: 652671 bytes
Desc: Pranav_Subramaniam_MS_paperv3.pdf
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20230911/f6e3e265/attachment-0001.pdf>


More information about the Colloquium mailing list