<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div><b>When</b>: Monday, October 17th from <b>9:00 - 11:00 am CT</b><br><br><b>Where</b>: Talk will be given <b><font color="#0000ff">live, in-person</font></b> at<br> TTIC, 6045 S. Kenwood Avenue<br> 5th Floor, Room 530</div><div><br><b>Virtually</b>: attend virtually <b><font color="#0000ff"><a href="https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09" target="_blank">here</a></font></b><br><br><b>Who</b>: Bowen Shi, TTIC<br><br><b>Thesis Title</b>: Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods<br><br><b>Abstract</b>: Sign language, which conveys meaning through gestures, is the chief means of communication among deaf people. Developing sign language processing techniques would bridge the communication barrier between deaf and hearing individuals and make artificial intelligence technologies more accessible to the Deaf community. Most prior work on sign language recognition has focused on studio datasets collected in a carefully controlled environment. Such datasets are valuable baselines but unrealistically simplistic, characterized by limited numbers of signers in a visually simple setting.</div><div><br>In this thesis, we study automatic sign language processing ”in the wild”, using signing videos collected from the Internet. Most of this thesis will regard fingerspelling, which is an important component of sign language and yet has not been studied widely by prior work. In American Sign Language (ASL), fingerspelling accounts for 12-35% of whole ASL discourse and is used frequently for content words in conversations involving current events or technical topics. In Deaf online media, transcribing even only the fingerspelled portions could add a great deal of value since these portions are often dense in such content words. I will present three large-scale ASL datasets ”in the wild”: ChicagoFSWild, ChicagoFSWild+, and OpenASL. ChicagoFSWild and ChicagoFSWild+ are two datasets of fingerspelling sequences trimmed from raw sign language videos. OpenASL is a large-scale open-domain real-world ASL-English translation dataset based on online subtitled sign language videos. Based on ChicagoFSWild and ChicagoFSWild+, we will address fingerspelling recognition, which consists of transcribing fingerspelling sequences into text. To tackle the visual challenges in real-world data, I will describe a recognition pipeline composed of a special-purpose signing hand detector and a fingerspelling recognizer, and an end-to-end approach based on iterative attention mechanism that allows recognizing fingerspelling from a raw video without explicit hand detection. We further show that using a Conformer-based network jointly modeling handshape and mouthing can bring significant gains to fingerspelling recognition. Next, I will describe two important tasks in building real-world fingerspelling-based applications: fingerspelling detection and fingerspelling search. For fingerspelling detection, we propose a suite of evaluation metrics and a new model that learns to detect fingerspelling via multi-task training. To address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos, we propose a novel method that jointly localizes and matches fingerspelling segments to text based on fingerspelling detection. Finally, I will describe a benchmark for large-vocabulary open-domain sign language translation. To address the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. I will conclude by discussing future directions for sign language processing in the wild.</div><div><br></div><div><b>Thesis Committee</b>: <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a> (thesis advisor), Greg Shakhnarovich, Diane Brentari, Chris Dyer</div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Oct 16, 2022 at 2:30 PM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div style="font-size:small"><div><b>When</b>: Monday, October 17th from <b>9:00 - 11:00 am CT</b><br><br><b>Where</b>: Talk will be given <b><font color="#0000ff">live, in-person</font></b> at<br> TTIC, 6045 S. Kenwood Avenue<br> 5th Floor, Room 530</div><div><br><b>Virtually</b>: attend virtually <b><font color="#0000ff"><a href="https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09" target="_blank">here</a></font></b><br><br><b>Who</b>: Bowen Shi, TTIC<br><br><b>Thesis Title</b>: Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods<br><br><b>Abstract</b>: Sign language, which conveys meaning through gestures, is the chief means of communication among deaf people. Developing sign language processing techniques would bridge the communication barrier between deaf and hearing individuals and make artificial intelligence technologies more accessible to the Deaf community. Most prior work on sign language recognition has focused on studio datasets collected in a carefully controlled environment. Such datasets are valuable baselines but unrealistically simplistic, characterized by limited numbers of signers in a visually simple setting.</div><div><br>In this thesis, we study automatic sign language processing ”in the wild”, using signing videos collected from the Internet. Most of this thesis will regard fingerspelling, which is an important component of sign language and yet has not been studied widely by prior work. In American Sign Language (ASL), fingerspelling accounts for 12-35% of whole ASL discourse and is used frequently for content words in conversations involving current events or technical topics. In Deaf online media, transcribing even only the fingerspelled portions could add a great deal of value since these portions are often dense in such content words. I will present three large-scale ASL datasets ”in the wild”: ChicagoFSWild, ChicagoFSWild+, and OpenASL. ChicagoFSWild and ChicagoFSWild+ are two datasets of fingerspelling sequences trimmed from raw sign language videos. OpenASL is a large-scale open-domain real-world ASL-English translation dataset based on online subtitled sign language videos. Based on ChicagoFSWild and ChicagoFSWild+, we will address fingerspelling recognition, which consists of transcribing fingerspelling sequences into text. To tackle the visual challenges in real-world data, I will describe a recognition pipeline composed of a special-purpose signing hand detector and a fingerspelling recognizer, and an end-to-end approach based on iterative attention mechanism that allows recognizing fingerspelling from a raw video without explicit hand detection. We further show that using a Conformer-based network jointly modeling handshape and mouthing can bring significant gains to fingerspelling recognition. Next, I will describe two important tasks in building real-world fingerspelling-based applications: fingerspelling detection and fingerspelling search. For fingerspelling detection, we propose a suite of evaluation metrics and a new model that learns to detect fingerspelling via multi-task training. To address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos, we propose a novel method that jointly localizes and matches fingerspelling segments to text based on fingerspelling detection. Finally, I will describe a benchmark for large-vocabulary open-domain sign language translation. To address the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. I will conclude by discussing future directions for sign language processing in the wild.</div><div><br></div><div><b>Thesis Committee</b>: <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a> (thesis advisor), Greg Shakhnarovich, Diane Brentari, Chris Dyer</div><div><br></div><div><br></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 10, 2022 at 9:47 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-size:small"><div><b>When</b>: Monday, October 17th from <b>9:00 - 11:00 am CT</b><br><br><b>Where</b>: Talk will be given <b><font color="#0000ff">live, in-person</font></b> at<br> TTIC, 6045 S. Kenwood Avenue<br> 5th Floor, Room 530</div><div><br><b>Virtually</b>: attend virtually <b><font color="#0000ff"><a href="https://uchicagogroup.zoom.us/j/96046080093?pwd=NVJ2NTNRV3ZaekgwOHJaaVN3QkxrQT09" target="_blank">here</a></font></b><br><br><b>Who</b>: Bowen Shi, TTIC<br><br><b>Thesis Title</b>: Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods<br><br><b>Abstract</b>: Sign language, which conveys meaning through gestures, is the chief means of communication among deaf people. Developing sign language processing techniques would bridge the communication barrier between deaf and hearing individuals and make artificial intelligence technologies more accessible to the Deaf community. Most prior work on sign language recognition has focused on studio datasets collected in a carefully controlled environment. Such datasets are valuable baselines but unrealistically simplistic, characterized by limited numbers of signers in a visually simple setting.</div><div><br>In this thesis, we study automatic sign language processing ”in the wild”, using signing videos collected from the Internet. Most of this thesis will regard fingerspelling, which is an important component of sign language and yet has not been studied widely by prior work. In American Sign Language (ASL), fingerspelling accounts for 12-35% of whole ASL discourse and is used frequently for content words in conversations involving current events or technical topics. In Deaf online media, transcribing even only the fingerspelled portions could add a great deal of value since these portions are often dense in such content words. I will present three large-scale ASL datasets ”in the wild”: ChicagoFSWild, ChicagoFSWild+, and OpenASL. ChicagoFSWild and ChicagoFSWild+ are two datasets of fingerspelling sequences trimmed from raw sign language videos. OpenASL is a large-scale open-domain real-world ASL-English translation dataset based on online subtitled sign language videos. Based on ChicagoFSWild and ChicagoFSWild+, we will address fingerspelling recognition, which consists of transcribing fingerspelling sequences into text. To tackle the visual challenges in real-world data, I will describe a recognition pipeline composed of a special-purpose signing hand detector and a fingerspelling recognizer, and an end-to-end approach based on iterative attention mechanism that allows recognizing fingerspelling from a raw video without explicit hand detection. We further show that using a Conformer-based network jointly modeling handshape and mouthing can bring significant gains to fingerspelling recognition. Next, I will describe two important tasks in building real-world fingerspelling-based applications: fingerspelling detection and fingerspelling search. For fingerspelling detection, we propose a suite of evaluation metrics and a new model that learns to detect fingerspelling via multi-task training. To address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos, we propose a novel method that jointly localizes and matches fingerspelling segments to text based on fingerspelling detection. Finally, I will describe a benchmark for large-vocabulary open-domain sign language translation. To address the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. I will conclude by discussing future directions for sign language processing in the wild.</div><div><br></div><div><b>Thesis Committee</b>: <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a> (thesis advisor), Greg Shakhnarovich, Diane Brentari, Chris Dyer</div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>
</blockquote></div></div>
</blockquote></div></div>