<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div><b>When</b>:    Tuesday, May 7th from <span style="background-color:rgb(255,255,0)"><b>1</b><b>- 3 pm CT</b></span></div><div><b><br></b></div><div><b>Where</b>:  Talk will be given <b><font color="#0000ff">live, in-person</font></b> at<br>              TTIC, 6045 S. Kenwood Avenue<br>              5th Floor, <b><u><font color="#000000">Room 529</font></u></b><b><br></b><br><b>Virtually</b>: via <a href="https://uchicago.zoom.us/j/99791771694?pwd=VXVqY2dXQUFlSHBjeU1WUndOSXd5QT09" target="_blank"><b>Zoom</b></a> <br></div><div><br><b>Who</b>:       Freda Shi, TTIC</div><div><br></div><div><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;text-align:center;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif"><hr size="2" width="100%" align="center"></div></div><div><div><b>Title:</b>      Learning Language Structures Through Grounding</div><div><br></div><div><div><b>Abstract: </b>Language is highly structured, with syntactic and semantic structures, to some extent, agreed upon by speakers of the same language. With implicit or explicit awareness of such structures, humans can learn and use language efficiently and generalize to sentences that contain unseen words. Instead of learning such structures from explicit manual annotations, in this dissertation, we consider a family of task formulation that aims to learn language structures through <i>grounding</i>. We seek distant supervision from other data sources (i.e., grounds), including but not limited to other modalities (e.g., vision), execution results of programs, and other languages. The grounds are connected to the language system through various forms, allowing language structures to be learned through grounded supervision signals.</div><br>We demonstrate and advocate for the potential of this task formulation in three schemes, each shown in a separate part of this dissertation. In Part I, we consider learning syntactic parses through visual grounding. We propose the task of visually grounded grammar induction, which aims at learning to predict the constituency parse tree of a sentence by reading the sentence and looking at a corresponding image. We present the first methods to induce syntactic structures from visually grounded text and speech and find that the visual grounding signals can help improve the parsing performance over text-only models. As a side contribution, we propose a novel evaluation metric that enables the evaluation of speech parsing without text or automatic speech recognition systems involved. In Part II, we propose two methods to map sentences into corresponding semantic structures (i.e., programs) under the supervision of execution results. One of them enables nearly perfect compositional generalization to unseen sentences with mild assumptions on domain knowledge, and the other significantly improves the performance of few-shot semantic parsing by leveraging the execution results of programs as a source of grounding signals. In Part III, we propose methods that learn language structures from annotations in other languages. Specifically, we propose a method that sets a new state-of-the-art performance on cross-lingual word alignment, without using any annotated parallel data. We then leverage the learned word alignments to improve the performance of zero-shot cross-lingual dependency parsing, by proposing a novel substructure-based projection method that preserves structural knowledge learned from the source language.<br></div><div><br></div><div><b><span class="gmail-il">Thesis</span> Committee: <a href="mailto:klivescu@ttic.edu" target="_blank">Karen Livescu</a>, </b><b><a href="mailto:kgimpel@ttic.edu" target="_blank">Kevin Gimpel</a> </b>(<font face="georgia, serif"><span class="gmail-il">Thesis</span> Advisors</font>); Luke Zettlemoyer (UW), Roger Levy (MIT)</div></div><div><br></div><div><br></div><div><br></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 29, 2024 at 3:53 PM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-size:small"><div><b>When</b>:    Tuesday, May 7th from <span style="background-color:rgb(255,255,0)"><b>1</b><b>- 3 pm CT</b></span></div><div><b><br></b></div><div><b>Where</b>:  Talk will be given <b><font color="#0000ff">live, in-person</font></b> at<br>              TTIC, 6045 S. Kenwood Avenue<br>              5th Floor, <b><u><font color="#000000">Room 529</font></u></b><b><br></b><br><b>Virtually</b>: via <a href="https://uchicago.zoom.us/j/99791771694?pwd=VXVqY2dXQUFlSHBjeU1WUndOSXd5QT09" target="_blank"><b>Zoom</b></a> <br></div><div><br><b>Who</b>:       Freda Shi, TTIC</div><div><br></div><div><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;text-align:center;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif"><hr size="2" width="100%" align="center"></div></div><div><div><b>Title:</b>      Learning Language Structures Through Grounding</div><div><br></div><div><div><b>Abstract: </b>Language is highly structured, with syntactic and semantic structures, to some extent, agreed upon by speakers of the same language. With implicit or explicit awareness of such structures, humans can learn and use language efficiently and generalize to sentences that contain unseen words. Instead of learning such structures from explicit manual annotations, in this dissertation, we consider a family of task formulation that aims to learn language structures through <i>grounding</i>. We seek distant supervision from other data sources (i.e., grounds), including but not limited to other modalities (e.g., vision), execution results of programs, and other languages. The grounds are connected to the language system through various forms, allowing language structures to be learned through grounded supervision signals.</div><br>We demonstrate and advocate for the potential of this task formulation in three schemes, each shown in a separate part of this dissertation. In Part I, we consider learning syntactic parses through visual grounding. We propose the task of visually grounded grammar induction, which aims at learning to predict the constituency parse tree of a sentence by reading the sentence and looking at a corresponding image. We present the first methods to induce syntactic structures from visually grounded text and speech and find that the visual grounding signals can help improve the parsing performance over text-only models. As a side contribution, we propose a novel evaluation metric that enables the evaluation of speech parsing without text or automatic speech recognition systems involved. In Part II, we propose two methods to map sentences into corresponding semantic structures (i.e., programs) under the supervision of execution results. One of them enables nearly perfect compositional generalization to unseen sentences with mild assumptions on domain knowledge, and the other significantly improves the performance of few-shot semantic parsing by leveraging the execution results of programs as a source of grounding signals. In Part III, we propose methods that learn language structures from annotations in other languages. Specifically, we propose a method that sets a new state-of-the-art performance on cross-lingual word alignment, without using any annotated parallel data. We then leverage the learned word alignments to improve the performance of zero-shot cross-lingual dependency parsing, by proposing a novel substructure-based projection method that preserves structural knowledge learned from the source language.<br></div><div><br></div><div><b><span>Thesis</span> Committee: <a href="mailto:klivescu@ttic.edu" target="_blank">Karen Livescu</a>, </b><b><a href="mailto:kgimpel@ttic.edu" target="_blank">Kevin Gimpel</a> </b>(<font face="georgia, serif"><span>Thesis</span> Advisors</font>); Luke Zettlemoyer (UW), Roger Levy (MIT)</div></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>

</blockquote></div></div>