<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;

        mso-ligatures:standardcontextual;}

span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri",sans-serif;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

</head>

<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">

<div class="WordSection1">

<p class="MsoNormal"><span style="color:black;background:white">This is an announcement of Pranav Subramaniam's MS Presentation</span><span style="color:black"><br>

<span style="background:white">===============================================</span><br>

<span style="background:white">Candidate: Pranav Subramaniam</span><br>

<br>

<span style="background:white">Date: Friday, September 22, 2023</span><br>

<br>

<span style="background:white">Time: 10:30 am CST</span><br>

<br>

<span style="background:white">Location: JCL 011</span><br>

<br>

<span style="background:white">M.S. Paper Title: LLM-AS-A-CROWD: HOW TO USE LLMS FOR DATA INTEGRATION TASKS</span><br>

<br>

<span style="background:white">Abstract: Large Language Models (LLMs) are capable of answering questions without task-specific</span><br>

<span style="background:white">training data, which creates an opportunity to address data integration tasks such as entity</span><br>

<span style="background:white">resolution, joinability, and unionability. Solving these tasks requires incorporating semantic</span><br>

<span style="background:white">knowledge, which has been a hard-to-address challenge. LLMs exhibit a tremendous capacity</span><br>

<span style="background:white">to understand nuance in language and thus are promising for this task. However, using them</span><br>

<span style="background:white">requires addressing two challenges: i) how to query them to obtain valid answers, i.e., prompt</span><br>

<span style="background:white">engineering; ii) how to best incorporate them along today’s software pipelines.</span><br>

<span style="background:white">In this paper, we study the potential of LLMs for data integration tasks. We find that</span><br>

<span style="background:white">thinking of LLMs-as-a-crowd is a useful mental model to leverage them for data integration</span><br>

<span style="background:white">tasks because it results in high quality results without any access to training data—which</span><br>

<span style="background:white">many state of the art methods rely on—and without any dependence on good prompt engineering</span><br>

<span style="background:white">skills. We integrate LLMs into software pipelines that leverage the vast research in</span><br>

<span style="background:white">the areas of entity resolution, joinability, and unionability. We find that LLMs are effective</span><br>

<span style="background:white">as an aid—but not a replacement–to software integration pipelines, thus effectively building</span><br>

<span style="background:white">on previous efforts.</span><br>

<span style="background:white">We obtain state of the art results for the three tasks we study, entity resolution, joinability,</span><br>

<span style="background:white">and unionability. And we demonstrate that thinking of LLMs-as-a-crowd is effective and</span><br>

<span style="background:white">complementary to other methods, such as few-shot learning. All in all, our experimental</span><br>

<span style="background:white">evaluation paves the way for further study of the use of LLMs for data integration tasks.</span><br>

<br>

<span style="background:white">Advisors: Raul Castro Fernandez</span><br>

<br>

<span style="background:white">Committee: Raul Castro Fernandez, Aaron Elmore, and Sanjay Krishnan</span></span><o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

</body>

</html>