[Theory] 5/7 Thesis Defense: Freda Shi, TTIC

Mon Apr 29 15:53:00 CDT 2024

*When*:    Tuesday, May 7th from *1**- 3 pm CT*

*Where*:  Talk will be given *live, in-person* at
              TTIC, 6045 S. Kenwood Avenue
              5th Floor, *Room 529*

*Virtually*: via *Zoom*
<https://uchicago.zoom.us/j/99791771694?pwd=VXVqY2dXQUFlSHBjeU1WUndOSXd5QT09>

*Who*:       Freda Shi, TTIC

------------------------------
*Title:*      Learning Language Structures Through Grounding

*Abstract: *Language is highly structured, with syntactic and semantic
structures, to some extent, agreed upon by speakers of the same language.
With implicit or explicit awareness of such structures, humans can learn
and use language efficiently and generalize to sentences that contain
unseen words. Instead of learning such structures from explicit manual
annotations, in this dissertation, we consider a family of task formulation
that aims to learn language structures through *grounding*. We seek distant
supervision from other data sources (i.e., grounds), including but not
limited to other modalities (e.g., vision), execution results of programs,
and other languages. The grounds are connected to the language system
through various forms, allowing language structures to be learned through
grounded supervision signals.

We demonstrate and advocate for the potential of this task formulation in
three schemes, each shown in a separate part of this dissertation. In Part
I, we consider learning syntactic parses through visual grounding. We
propose the task of visually grounded grammar induction, which aims at
learning to predict the constituency parse tree of a sentence by reading
the sentence and looking at a corresponding image. We present the first
methods to induce syntactic structures from visually grounded text and
speech and find that the visual grounding signals can help improve the
parsing performance over text-only models. As a side contribution, we
propose a novel evaluation metric that enables the evaluation of speech
parsing without text or automatic speech recognition systems involved. In
Part II, we propose two methods to map sentences into corresponding
semantic structures (i.e., programs) under the supervision of execution
results. One of them enables nearly perfect compositional generalization to
unseen sentences with mild assumptions on domain knowledge, and the other
significantly improves the performance of few-shot semantic parsing by
leveraging the execution results of programs as a source of grounding
signals. In Part III, we propose methods that learn language structures
from annotations in other languages. Specifically, we propose a method that
sets a new state-of-the-art performance on cross-lingual word alignment,
without using any annotated parallel data. We then leverage the learned
word alignments to improve the performance of zero-shot cross-lingual
dependency parsing, by proposing a novel substructure-based projection
method that preserves structural knowledge learned from the source language.

*Thesis Committee: Karen Livescu <klivescu at ttic.edu>, **Kevin Gimpel
<kgimpel at ttic.edu> *(Thesis Advisors); Luke Zettlemoyer (UW), Roger Levy
(MIT)

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL  60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20240429/14ee6804/attachment.html>