[Theory] 10/17 Thesis Defense: Bowen Shi, TTIC
Mary Marre
mmarre at ttic.edu
Mon Oct 10 21:47:47 CDT 2022
*When*: Monday, October 17th from *9:00 - 11:00 am CT*
*Where*: Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually*: attend virtually *here
<https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09>*
*Who*: Bowen Shi, TTIC
*Thesis Title*: Toward American Sign Language Processing in the Real World:
Data, Tasks, and Methods
*Abstract*: Sign language, which conveys meaning through gestures, is the
chief means of communication among deaf people. Developing sign language
processing techniques would bridge the communication barrier between deaf
and hearing individuals and make artificial intelligence technologies more
accessible to the Deaf community. Most prior work on sign language
recognition has focused on studio datasets collected in a carefully
controlled environment. Such datasets are valuable baselines but
unrealistically simplistic, characterized by limited numbers of signers in
a visually simple setting.
In this thesis, we study automatic sign language processing ”in the wild”,
using signing videos collected from the Internet. Most of this thesis will
regard fingerspelling, which is an important component of sign language and
yet has not been studied widely by prior work. In American Sign Language
(ASL), fingerspelling accounts for 12-35% of whole ASL discourse and is
used frequently for content words in conversations involving current events
or technical topics. In Deaf online media, transcribing even only the
fingerspelled portions could add a great deal of value since these portions
are often dense in such content words. I will present three large-scale ASL
datasets ”in the wild”: ChicagoFSWild, ChicagoFSWild+, and OpenASL.
ChicagoFSWild and ChicagoFSWild+ are two datasets of fingerspelling
sequences trimmed from raw sign language videos. OpenASL is a large-scale
open-domain real-world ASL-English translation dataset based on online
subtitled sign language videos. Based on ChicagoFSWild and ChicagoFSWild+,
we will address fingerspelling recognition, which consists of transcribing
fingerspelling sequences into text. To tackle the visual challenges in
real-world data, I will describe a recognition pipeline composed of a
special-purpose signing hand detector and a fingerspelling recognizer, and
an end-to-end approach based on iterative attention mechanism that allows
recognizing fingerspelling from a raw video without explicit hand
detection. We further show that using a Conformer-based network jointly
modeling handshape and mouthing can bring significant gains to
fingerspelling recognition. Next, I will describe two important tasks in
building real-world fingerspelling-based applications: fingerspelling
detection and fingerspelling search. For fingerspelling detection, we
propose a suite of evaluation metrics and a new model that learns to detect
fingerspelling via multi-task training. To address the problem of searching
for fingerspelled keywords or key phrases in raw sign language videos, we
propose a novel method that jointly localizes and matches fingerspelling
segments to text based on fingerspelling detection. Finally, I will
describe a benchmark for large-vocabulary open-domain sign language
translation. To address the challenges of sign language translation in
realistic settings and without glosses, we propose a set of techniques
including sign search as a pretext task for pre-training and fusion of
mouthing and handshape features. I will conclude by discussing future
directions for sign language processing in the wild.
*Thesis Committee*: *Karen Livescu* <klivescu at ttic.edu> (thesis advisor),
Greg Shakhnarovich, Diane Brentari, Chris Dyer
Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Chicago, IL 60637*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20221010/70687721/attachment.html>
More information about the Theory
mailing list