[Theory] NOW: 10/17 Thesis Defense: Bowen Shi, TTIC

Mon Oct 17 09:01:52 CDT 2022

*When*:  Monday, October 17th from *9:00 - 11:00 am CT*

*Where*: Talk will be given *live, in-person* at
             TTIC, 6045 S. Kenwood Avenue
             5th Floor, Room 530

*Virtually*: attend virtually *here
<https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09>*

*Who*:    Bowen Shi, TTIC

*Thesis Title*: Toward American Sign Language Processing in the Real World:
Data, Tasks, and Methods

*Abstract*: Sign language, which conveys meaning through gestures, is the
chief means of communication among deaf people. Developing sign language
processing techniques would bridge the communication barrier between deaf
and hearing individuals and make artificial intelligence technologies more
accessible to the Deaf community. Most prior work on sign language
recognition has focused on studio datasets collected in a carefully
controlled environment. Such datasets are valuable baselines but
unrealistically simplistic, characterized by limited numbers of signers in
a visually simple setting.

In this thesis, we study automatic sign language processing ”in the wild”,
using signing videos collected from the Internet. Most of this thesis will
regard fingerspelling, which is an important component of sign language and
yet has not been studied widely by prior work. In American Sign Language
(ASL), fingerspelling accounts for 12-35% of whole ASL discourse and is
used frequently for content words in conversations involving current events
or technical topics. In Deaf online media, transcribing even only the
fingerspelled portions could add a great deal of value since these portions
are often dense in such content words. I will present three large-scale ASL
datasets ”in the wild”: ChicagoFSWild, ChicagoFSWild+, and OpenASL.
ChicagoFSWild and ChicagoFSWild+ are two datasets of fingerspelling
sequences trimmed from raw sign language videos. OpenASL is a large-scale
open-domain real-world ASL-English translation dataset based on online
subtitled sign language videos. Based on ChicagoFSWild and ChicagoFSWild+,
we will address fingerspelling recognition, which consists of transcribing
fingerspelling sequences into text. To tackle the visual challenges in
real-world data, I will describe a recognition pipeline composed of a
special-purpose signing hand detector and a fingerspelling recognizer, and
an end-to-end approach based on iterative attention mechanism that allows
recognizing fingerspelling from a raw video without explicit hand
detection. We further show that using a Conformer-based network jointly
modeling handshape and mouthing can bring significant gains to
fingerspelling recognition. Next, I will describe two important tasks in
building real-world fingerspelling-based applications: fingerspelling
detection and fingerspelling search. For fingerspelling detection, we
propose a suite of evaluation metrics and a new model that learns to detect
fingerspelling via multi-task training. To address the problem of searching
for fingerspelled keywords or key phrases in raw sign language videos, we
propose a novel method that jointly localizes and matches fingerspelling
segments to text based on fingerspelling detection. Finally, I will
describe a benchmark for large-vocabulary open-domain sign language
translation. To address the challenges of sign language translation in
realistic settings and without glosses, we propose a set of techniques
including sign search as a pretext task for pre-training and fusion of
mouthing and handshape features. I will conclude by discussing future
directions for sign language processing in the wild.

*Thesis Committee*: *Karen Livescu* <klivescu at ttic.edu> (thesis advisor),
Greg Shakhnarovich, Diane Brentari, Chris Dyer

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Chicago, IL  60637*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Mon, Oct 17, 2022 at 8:00 AM Mary Marre <mmarre at ttic.edu> wrote:

> *When*:  Monday, October 17th from *9:00 - 11:00 am CT*
>
> *Where*: Talk will be given *live, in-person* at
>              TTIC, 6045 S. Kenwood Avenue
>              5th Floor, Room 530
>
> *Virtually*: attend virtually *here
> <https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09>*
>
> *Who*:    Bowen Shi, TTIC
>
> *Thesis Title*: Toward American Sign Language Processing in the Real
> World: Data, Tasks, and Methods
>
> *Abstract*: Sign language, which conveys meaning through gestures, is the
> chief means of communication among deaf people. Developing sign language
> processing techniques would bridge the communication barrier between deaf
> and hearing individuals and make artificial intelligence technologies more
> accessible to the Deaf community. Most prior work on sign language
> recognition has focused on studio datasets collected in a carefully
> controlled environment. Such datasets are valuable baselines but
> unrealistically simplistic, characterized by limited numbers of signers in
> a visually simple setting.
>
> In this thesis, we study automatic sign language processing ”in the wild”,
> using signing videos collected from the Internet. Most of this thesis will
> regard fingerspelling, which is an important component of sign language and
> yet has not been studied widely by prior work. In American Sign Language
> (ASL), fingerspelling accounts for 12-35% of whole ASL discourse and is
> used frequently for content words in conversations involving current events
> or technical topics. In Deaf online media, transcribing even only the
> fingerspelled portions could add a great deal of value since these portions
> are often dense in such content words. I will present three large-scale ASL
> datasets ”in the wild”: ChicagoFSWild, ChicagoFSWild+, and OpenASL.
> ChicagoFSWild and ChicagoFSWild+ are two datasets of fingerspelling
> sequences trimmed from raw sign language videos. OpenASL is a large-scale
> open-domain real-world ASL-English translation dataset based on online
> subtitled sign language videos. Based on ChicagoFSWild and ChicagoFSWild+,
> we will address fingerspelling recognition, which consists of transcribing
> fingerspelling sequences into text. To tackle the visual challenges in
> real-world data, I will describe a recognition pipeline composed of a
> special-purpose signing hand detector and a fingerspelling recognizer, and
> an end-to-end approach based on iterative attention mechanism that allows
> recognizing fingerspelling from a raw video without explicit hand
> detection. We further show that using a Conformer-based network jointly
> modeling handshape and mouthing can bring significant gains to
> fingerspelling recognition. Next, I will describe two important tasks in
> building real-world fingerspelling-based applications: fingerspelling
> detection and fingerspelling search. For fingerspelling detection, we
> propose a suite of evaluation metrics and a new model that learns to detect
> fingerspelling via multi-task training. To address the problem of searching
> for fingerspelled keywords or key phrases in raw sign language videos, we
> propose a novel method that jointly localizes and matches fingerspelling
> segments to text based on fingerspelling detection. Finally, I will
> describe a benchmark for large-vocabulary open-domain sign language
> translation. To address the challenges of sign language translation in
> realistic settings and without glosses, we propose a set of techniques
> including sign search as a pretext task for pre-training and fusion of
> mouthing and handshape features. I will conclude by discussing future
> directions for sign language processing in the wild.
>
> *Thesis Committee*: *Karen Livescu* <klivescu at ttic.edu> (thesis advisor),
> Greg Shakhnarovich, Diane Brentari, Chris Dyer
>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Chicago, IL  60637*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Sun, Oct 16, 2022 at 2:30 PM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When*:  Monday, October 17th from *9:00 - 11:00 am CT*
>>
>> *Where*: Talk will be given *live, in-person* at
>>              TTIC, 6045 S. Kenwood Avenue
>>              5th Floor, Room 530
>>
>> *Virtually*: attend virtually *here
>> <https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09>*
>>
>> *Who*:    Bowen Shi, TTIC
>>
>> *Thesis Title*: Toward American Sign Language Processing in the Real
>> World: Data, Tasks, and Methods
>>
>> *Abstract*: Sign language, which conveys meaning through gestures, is
>> the chief means of communication among deaf people. Developing sign
>> language processing techniques would bridge the communication barrier
>> between deaf and hearing individuals and make artificial intelligence
>> technologies more accessible to the Deaf community. Most prior work on sign
>> language recognition has focused on studio datasets collected in a
>> carefully controlled environment. Such datasets are valuable baselines but
>> unrealistically simplistic, characterized by limited numbers of signers in
>> a visually simple setting.
>>
>> In this thesis, we study automatic sign language processing ”in the
>> wild”, using signing videos collected from the Internet. Most of this
>> thesis will regard fingerspelling, which is an important component of sign
>> language and yet has not been studied widely by prior work. In American
>> Sign Language (ASL), fingerspelling accounts for 12-35% of whole ASL
>> discourse and is used frequently for content words in conversations
>> involving current events or technical topics. In Deaf online media,
>> transcribing even only the fingerspelled portions could add a great deal of
>> value since these portions are often dense in such content words. I will
>> present three large-scale ASL datasets ”in the wild”: ChicagoFSWild,
>> ChicagoFSWild+, and OpenASL. ChicagoFSWild and ChicagoFSWild+ are two
>> datasets of fingerspelling sequences trimmed from raw sign language videos.
>> OpenASL is a large-scale open-domain real-world ASL-English translation
>> dataset based on online subtitled sign language videos. Based on
>> ChicagoFSWild and ChicagoFSWild+, we will address fingerspelling
>> recognition, which consists of transcribing fingerspelling sequences into
>> text. To tackle the visual challenges in real-world data, I will describe a
>> recognition pipeline composed of a special-purpose signing hand detector
>> and a fingerspelling recognizer, and an end-to-end approach based on
>> iterative attention mechanism that allows recognizing fingerspelling from a
>> raw video without explicit hand detection. We further show that using a
>> Conformer-based network jointly modeling handshape and mouthing can bring
>> significant gains to fingerspelling recognition. Next, I will describe two
>> important tasks in building real-world fingerspelling-based applications:
>> fingerspelling detection and fingerspelling search. For fingerspelling
>> detection, we propose a suite of evaluation metrics and a new model that
>> learns to detect fingerspelling via multi-task training. To address the
>> problem of searching for fingerspelled keywords or key phrases in raw sign
>> language videos, we propose a novel method that jointly localizes and
>> matches fingerspelling segments to text based on fingerspelling detection.
>> Finally, I will describe a benchmark for large-vocabulary open-domain sign
>> language translation. To address the challenges of sign language
>> translation in realistic settings and without glosses, we propose a set of
>> techniques including sign search as a pretext task for pre-training and
>> fusion of mouthing and handshape features. I will conclude by discussing
>> future directions for sign language processing in the wild.
>>
>> *Thesis Committee*: *Karen Livescu* <klivescu at ttic.edu> (thesis
>> advisor), Greg Shakhnarovich, Diane Brentari, Chris Dyer
>>
>>
>> Mary C. Marre
>> Faculty Administrative Support
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Chicago, IL  60637*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>>
>> On Mon, Oct 10, 2022 at 9:47 PM Mary Marre <mmarre at ttic.edu> wrote:
>>
>>> *When*:  Monday, October 17th from *9:00 - 11:00 am CT*
>>>
>>> *Where*: Talk will be given *live, in-person* at
>>>              TTIC, 6045 S. Kenwood Avenue
>>>              5th Floor, Room 530
>>>
>>> *Virtually*: attend virtually *here
>>> <https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09>*
>>>
>>> *Who*:    Bowen Shi, TTIC
>>>
>>> *Thesis Title*: Toward American Sign Language Processing in the Real
>>> World: Data, Tasks, and Methods
>>>
>>> *Abstract*: Sign language, which conveys meaning through gestures, is
>>> the chief means of communication among deaf people. Developing sign
>>> language processing techniques would bridge the communication barrier
>>> between deaf and hearing individuals and make artificial intelligence
>>> technologies more accessible to the Deaf community. Most prior work on sign
>>> language recognition has focused on studio datasets collected in a
>>> carefully controlled environment. Such datasets are valuable baselines but
>>> unrealistically simplistic, characterized by limited numbers of signers in
>>> a visually simple setting.
>>>
>>> In this thesis, we study automatic sign language processing ”in the
>>> wild”, using signing videos collected from the Internet. Most of this
>>> thesis will regard fingerspelling, which is an important component of sign
>>> language and yet has not been studied widely by prior work. In American
>>> Sign Language (ASL), fingerspelling accounts for 12-35% of whole ASL
>>> discourse and is used frequently for content words in conversations
>>> involving current events or technical topics. In Deaf online media,
>>> transcribing even only the fingerspelled portions could add a great deal of
>>> value since these portions are often dense in such content words. I will
>>> present three large-scale ASL datasets ”in the wild”: ChicagoFSWild,
>>> ChicagoFSWild+, and OpenASL. ChicagoFSWild and ChicagoFSWild+ are two
>>> datasets of fingerspelling sequences trimmed from raw sign language videos.
>>> OpenASL is a large-scale open-domain real-world ASL-English translation
>>> dataset based on online subtitled sign language videos. Based on
>>> ChicagoFSWild and ChicagoFSWild+, we will address fingerspelling
>>> recognition, which consists of transcribing fingerspelling sequences into
>>> text. To tackle the visual challenges in real-world data, I will describe a
>>> recognition pipeline composed of a special-purpose signing hand detector
>>> and a fingerspelling recognizer, and an end-to-end approach based on
>>> iterative attention mechanism that allows recognizing fingerspelling from a
>>> raw video without explicit hand detection. We further show that using a
>>> Conformer-based network jointly modeling handshape and mouthing can bring
>>> significant gains to fingerspelling recognition. Next, I will describe two
>>> important tasks in building real-world fingerspelling-based applications:
>>> fingerspelling detection and fingerspelling search. For fingerspelling
>>> detection, we propose a suite of evaluation metrics and a new model that
>>> learns to detect fingerspelling via multi-task training. To address the
>>> problem of searching for fingerspelled keywords or key phrases in raw sign
>>> language videos, we propose a novel method that jointly localizes and
>>> matches fingerspelling segments to text based on fingerspelling detection.
>>> Finally, I will describe a benchmark for large-vocabulary open-domain sign
>>> language translation. To address the challenges of sign language
>>> translation in realistic settings and without glosses, we propose a set of
>>> techniques including sign search as a pretext task for pre-training and
>>> fusion of mouthing and handshape features. I will conclude by discussing
>>> future directions for sign language processing in the wild.
>>>
>>> *Thesis Committee*: *Karen Livescu* <klivescu at ttic.edu> (thesis
>>> advisor), Greg Shakhnarovich, Diane Brentari, Chris Dyer
>>>
>>>
>>>
>>>
>>> Mary C. Marre
>>> Faculty Administrative Support
>>> *Toyota Technological Institute*
>>> *6045 S. Kenwood Avenue*
>>> *Chicago, IL  60637*
>>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20221017/ae731987/attachment-0001.html>