[Theory] REMINDER: 8/24 Thesis Defense: Shubham Toshniwal, TTIC
Mary Marre
mmarre at ttic.edu
Sun Aug 21 11:50:26 CDT 2022
*Thesis Defense: Shubham Toshniwal, TTIC*
When: Wednesday, August 24th at *12:00 - 2:00 pm CT*
Where: Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
Virtually: attend virtually here
<https://uchicagogroup.zoom.us/meeting/register/tJUtcOqtrj0uHdSu3hNQyRXrEkZis5pz3vBa>
Who: Shubham Toshniwal, TTIC
Thesis Title: Efficient and Interpretable Neural Models for Entity Tracking
Abstract: What would it take for a natural language model to understand a
novel, such as The Lord of the Rings? Among other things, such a model must
be able to: (a) identify and record new characters (entities) and their
attributes as they are introduced in the text, and (b) identify subsequent
references to the characters previously introduced and update their
attributes. This problem of entity tracking is essential for language
understanding, and thus, useful for a wide array of downstream applications
in NLP such as question-answering, summarization, etc.
In this thesis, we focus on two key problems in relation to facilitating
the use of entity tracking models: (i) scaling entity tracking models to
long documents, such as a novel, and (ii) integrating entity tracking into
language models. Applying language technologies to long documents has
garnered interest recently, but computational reasons are a significant
bottleneck in scaling up current methods. In this thesis, we argue that
computationally efficient entity tracking models can be developed by
representing entities with rich, fixed-dimensional vector representations
derived from pretrained language models, and by exploiting the ephemeral
nature of entities. We also argue for the integration of entity tracking
into language models as it will allow for: (i) a wider application given
the current ubiquitous use of pretrained language models in NLP
applications, and (ii) an easier adoption since it is much easier to swap
in a new pretrained language model than integrating a separate standalone
entity tracking model.
The thesis is divided into two parts. In the first half of this thesis, we
focus on a specific class of entity tracking problem referred to as
coreference resolution. The goal here is to identify text spans referring
to the same entity. We propose memory models where the external memory
module is trained to explicitly track the entities mentioned in the text.
We first discuss a sparsely supervised memory model for the pronoun
resolution task. This model outperformed prior work on both the end task
and the interpretability measures at the time of its introduction. We then
adapt this memory model for the full coreference resolution task. The
proposed memory models can effectively scale to long documents, and in
particular, the proposed bounded memory model offers a linear runtime
complexity in document length while remaining competitive with the
state-of-the-art models. Next, we test the presented models for their
generalization capability, specifically their zero-shot evaluation on other
coreference benchmarks. We find that domain shift is a challenge in
coreference resolution though annotation differences across datasets partly
exaggerate this challenge. We also find that joint training on multiple
datasets moderately alleviates the domain shift challenge. Finally, the
presented models have achieved state-of-the-art performance on multiple
coreference benchmarks.
In the latter half, we focus on integrating entity tracking capability into
neural language models. As a first step, we propose the task of language
modeling for the game of chess to evaluate the entity tracking capabilities
of transformer LMs. Our experiments on chess suggest that augmenting LM
training instances with board state information (represented as text
tokens) aids state tracking and language modeling performance. Training LMs
with state-augmented instances also allows probing for entity state at
inference time simply via prompting. Next, we extend these findings from
chess to natural language. We first experiment in a closed domain where we
show that state-augmented training improves the state tracking performance
and the text generation quality. Finally, we adapt the state-augmented
training for baking coreference knowledge into natural language models and
show improvements on a popular cloze task.
Thesis Committee: *Kevin Gimpel* (*Thesis Advisor*), *Karen Livescu *(*Thesis
Advisor*), Sam Wiseman, Kenton Lee, Yejin Choi
Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Chicago, IL 60637*
*mmarre at ttic.edu <mmarre at ttic.edu>*
On Sat, Aug 13, 2022 at 9:47 AM Mary Marre <mmarre at ttic.edu> wrote:
> *Thesis Defense: Shubham Toshniwal, TTIC*
>
> When: Wednesday, August 24th at *12:00 - 2:00 pm CT*
>
>
> Where: Talk will be given *live, in-person* at
>
> TTIC, 6045 S. Kenwood Avenue
>
> 5th Floor, Room 530
>
>
> Virtually: attend virtually here
> <https://uchicagogroup.zoom.us/meeting/register/tJUtcOqtrj0uHdSu3hNQyRXrEkZis5pz3vBa>
>
>
> Who: Shubham Toshniwal, TTIC
>
>
>
> Thesis Title: Efficient and Interpretable Neural Models for Entity
> Tracking
>
>
> Abstract: What would it take for a natural language model to understand a
> novel, such as The Lord of the Rings? Among other things, such a model must
> be able to: (a) identify and record new characters (entities) and their
> attributes as they are introduced in the text, and (b) identify subsequent
> references to the characters previously introduced and update their
> attributes. This problem of entity tracking is essential for language
> understanding, and thus, useful for a wide array of downstream applications
> in NLP such as question-answering, summarization, etc.
>
>
> In this thesis, we focus on two key problems in relation to facilitating
> the use of entity tracking models: (i) scaling entity tracking models to
> long documents, such as a novel, and (ii) integrating entity tracking into
> language models. Applying language technologies to long documents has
> garnered interest recently, but computational reasons are a significant
> bottleneck in scaling up current methods. In this thesis, we argue that
> computationally efficient entity tracking models can be developed by
> representing entities with rich, fixed-dimensional vector representations
> derived from pretrained language models, and by exploiting the ephemeral
> nature of entities. We also argue for the integration of entity tracking
> into language models as it will allow for: (i) a wider application given
> the current ubiquitous use of pretrained language models in NLP
> applications, and (ii) an easier adoption since it is much easier to swap
> in a new pretrained language model than integrating a separate standalone
> entity tracking model.
>
>
> The thesis is divided into two parts. In the first half of this thesis, we
> focus on a specific class of entity tracking problem referred to as
> coreference resolution. The goal here is to identify text spans referring
> to the same entity. We propose memory models where the external memory
> module is trained to explicitly track the entities mentioned in the text.
> We first discuss a sparsely supervised memory model for the pronoun
> resolution task. This model outperformed prior work on both the end task
> and the interpretability measures at the time of its introduction. We then
> adapt this memory model for the full coreference resolution task. The
> proposed memory models can effectively scale to long documents, and in
> particular, the proposed bounded memory model offers a linear runtime
> complexity in document length while remaining competitive with the
> state-of-the-art models. Next, we test the presented models for their
> generalization capability, specifically their zero-shot evaluation on other
> coreference benchmarks. We find that domain shift is a challenge in
> coreference resolution though annotation differences across datasets partly
> exaggerate this challenge. We also find that joint training on multiple
> datasets moderately alleviates the domain shift challenge. Finally, the
> presented models have achieved state-of-the-art performance on multiple
> coreference benchmarks.
>
>
> In the latter half, we focus on integrating entity tracking capability
> into neural language models. As a first step, we propose the task of
> language modeling for the game of chess to evaluate the entity tracking
> capabilities of transformer LMs. Our experiments on chess suggest that
> augmenting LM training instances with board state information (represented
> as text tokens) aids state tracking and language modeling performance.
> Training LMs with state-augmented instances also allows probing for entity
> state at inference time simply via prompting. Next, we extend these
> findings from chess to natural language. We first experiment in a closed
> domain where we show that state-augmented training improves the state
> tracking performance and the text generation quality. Finally, we adapt the
> state-augmented training for baking coreference knowledge into natural
> language models and show improvements on a popular cloze task.
>
>
> Thesis Committee: *Kevin Gimpel* (*Thesis Advisor*), *Karen Livescu *(*Thesis
> Advisor*), Sam Wiseman, Kenton Lee, Yejin Choi
>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Chicago, IL 60637*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20220821/44fac49e/attachment-0001.html>
More information about the Theory
mailing list