<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText">This is an announcement of Pranav Subramaniam's MS Presentation<br>


===============================================<br>


Candidate: Pranav Subramaniam<br>


<br>


Date: Friday, September 22, 2023<br>


<br>


Time: 10:30 am CST<br>


<br>


Location: JCL 011<br>


<br>


M.S. Paper Title: LLM-AS-A-CROWD: HOW TO USE LLMS FOR DATA INTEGRATION TASKS<br>


<br>


Abstract: Large Language Models (LLMs) are capable of answering questions without task-specific<br>


training data, which creates an opportunity to address data integration tasks such as entity<br>


resolution, joinability, and unionability. Solving these tasks requires incorporating semantic<br>


knowledge, which has been a hard-to-address challenge. LLMs exhibit a tremendous capacity<br>


to understand nuance in language and thus are promising for this task. However, using them<br>


requires addressing two challenges: i) how to query them to obtain valid answers, i.e., prompt<br>


engineering; ii) how to best incorporate them along today’s software pipelines.<br>


In this paper, we study the potential of LLMs for data integration tasks. We find that<br>


thinking of LLMs-as-a-crowd is a useful mental model to leverage them for data integration<br>


tasks because it results in high quality results without any access to training data—which<br>


many state of the art methods rely on—and without any dependence on good prompt engineering<br>


skills. We integrate LLMs into software pipelines that leverage the vast research in<br>


the areas of entity resolution, joinability, and unionability. We find that LLMs are effective<br>


as an aid—but not a replacement–to software integration pipelines, thus effectively building<br>


on previous efforts.<br>


We obtain state of the art results for the three tasks we study, entity resolution, joinability,<br>


and unionability. And we demonstrate that thinking of LLMs-as-a-crowd is effective and<br>


complementary to other methods, such as few-shot learning. All in all, our experimental<br>


evaluation paves the way for further study of the use of LLMs for data integration tasks.<br>


<br>


Advisors: Raul Castro Fernandez<br>


<br>


Committee: Raul Castro Fernandez, Aaron Elmore, and Sanjay Krishnan<br>


<br>


<br>


<br>


</div>


</span></font></div>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText"><br>


<br>


<br>


<br>


<br>


</div>


</span></font></div>


</body>


</html>