<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-ligatures:standardcontextual;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:black;background:white">This is an announcement of Pranav Subramaniam's MS Presentation</span><span style="color:black"><br>
<span style="background:white">===============================================</span><br>
<span style="background:white">Candidate: Pranav Subramaniam</span><br>
<br>
<span style="background:white">Date: Friday, September 22, 2023</span><br>
<br>
<span style="background:white">Time: 10:30 am CST</span><br>
<br>
<span style="background:white">Location: JCL 011</span><br>
<br>
<span style="background:white">M.S. Paper Title: LLM-AS-A-CROWD: HOW TO USE LLMS FOR DATA INTEGRATION TASKS</span><br>
<br>
<span style="background:white">Abstract: Large Language Models (LLMs) are capable of answering questions without task-specific</span><br>
<span style="background:white">training data, which creates an opportunity to address data integration tasks such as entity</span><br>
<span style="background:white">resolution, joinability, and unionability. Solving these tasks requires incorporating semantic</span><br>
<span style="background:white">knowledge, which has been a hard-to-address challenge. LLMs exhibit a tremendous capacity</span><br>
<span style="background:white">to understand nuance in language and thus are promising for this task. However, using them</span><br>
<span style="background:white">requires addressing two challenges: i) how to query them to obtain valid answers, i.e., prompt</span><br>
<span style="background:white">engineering; ii) how to best incorporate them along today’s software pipelines.</span><br>
<span style="background:white">In this paper, we study the potential of LLMs for data integration tasks. We find that</span><br>
<span style="background:white">thinking of LLMs-as-a-crowd is a useful mental model to leverage them for data integration</span><br>
<span style="background:white">tasks because it results in high quality results without any access to training data—which</span><br>
<span style="background:white">many state of the art methods rely on—and without any dependence on good prompt engineering</span><br>
<span style="background:white">skills. We integrate LLMs into software pipelines that leverage the vast research in</span><br>
<span style="background:white">the areas of entity resolution, joinability, and unionability. We find that LLMs are effective</span><br>
<span style="background:white">as an aid—but not a replacement–to software integration pipelines, thus effectively building</span><br>
<span style="background:white">on previous efforts.</span><br>
<span style="background:white">We obtain state of the art results for the three tasks we study, entity resolution, joinability,</span><br>
<span style="background:white">and unionability. And we demonstrate that thinking of LLMs-as-a-crowd is effective and</span><br>
<span style="background:white">complementary to other methods, such as few-shot learning. All in all, our experimental</span><br>
<span style="background:white">evaluation paves the way for further study of the use of LLMs for data integration tasks.</span><br>
<br>
<span style="background:white">Advisors: Raul Castro Fernandez</span><br>
<br>
<span style="background:white">Committee: Raul Castro Fernandez, Aaron Elmore, and Sanjay Krishnan</span></span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>