[Theory] REMINDER: 1/28 Talks at TTIC: Huda Khayrallah, Johns Hopkins University
Mary Marre
mmarre at ttic.edu
Wed Jan 27 14:03:29 CST 2021
*When:* Thursday, January 28th at* 11:10 am CT*
*Where:* Zoom Virtual Talk (*register in advance here
<https://uchicagogroup.zoom.us/webinar/register/WN_LXDxvgkcQJaDvB9ptbv2cQ>*)
*Who: * Huda Khayrallah, Johns Hopkins University
*Title: *Machine Translation for All: Improving Machine Translation in Low
Resource, Domain Mismatch, and Low Resource Settings
*Abstract:* Machine translation uses machine learning to automatically
translate text from one language to another and has the potential to reduce
language barriers. Recent improvements in machine translation have made it
more widely-usable, partly due to deep neural network approaches.
However—like most deep learning algorithms—neural machine translation is
sensitive to the quantity and quality of training data, and therefore
produces poor translations for some languages and styles of text. Machine
translation training data typically comes in the form of parallel
text—sentences translated between the two languages of interest. Limited
quantities of parallel text are available for most language pairs, leading
to a low-resource problem. Even when training data is available in the
desired language pair, it is frequently formal text—leading to a domain
mismatch when models are used to translate a different type of data, such
as social media or medical text. Neural machine translation currently
performs poorly in low-resource and domain mismatch settings; my work aims
to overcome these limitations, and make machine translation a useful tool
for all users.In this talk, I will discuss a method for improving
translation in low resource settings—Simulated Multiple Reference Training
(SMRT; Khayrallah et al., 2020)—which uses a paraphraser to simulate
training on all possible translations per sentence. I will also discuss
work on improving domain adaptation (Khayrallah et al., 2018), and work on
analyzing the effect of noisy training data (Khayrallah and Koehn, 2018).
*Bio: *Huda Khayrallah is a PhD candidate in Computer Science at The Johns
Hopkins University where she is advised by Philipp Koehn. She is part of
the Center for Language and Speech Processing and the machine translation
group. She works on applied machine learning for natural language
processing, primarily machine translation. Her work focuses on overcoming
deep learning’s sensitivity to the quantity and quality of the training
data, including low resource and domain adaptation settings. In Summer
2019, she was a research intern at Lilt, working on translator-in-the-loop
machine translation. She holds an MSE in Computer Science from Johns
Hopkins (2017), and a BA in Computer Science from UC Berkeley (2015). More
information about her can be found on her website: http://www.cs.jhu.edu/~
huda
*Host:* Kevin Gimpel <kgimpel at ttic.edu> <kgimpel at ttic.edu>
Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL 60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*
On Thu, Jan 21, 2021 at 9:08 PM Mary Marre <mmarre at ttic.edu> wrote:
> *When:* Thursday, January 28th at* 11:10 am CT*
>
>
>
> *Where:* Zoom Virtual Talk (*register in advance here
> <https://uchicagogroup.zoom.us/webinar/register/WN_LXDxvgkcQJaDvB9ptbv2cQ>*
> )
>
>
>
> *Who: * Huda Khayrallah, Johns Hopkins University
>
> *Title: *Machine Translation for All: Improving Machine Translation in
> Low Resource, Domain Mismatch, and Low Resource Settings
>
>
> *Abstract:* Machine translation uses machine learning to automatically
> translate text from one language to another and has the potential to reduce
> language barriers. Recent improvements in machine translation have made it
> more widely-usable, partly due to deep neural network approaches.
> However—like most deep learning algorithms—neural machine translation is
> sensitive to the quantity and quality of training data, and therefore
> produces poor translations for some languages and styles of text. Machine
> translation training data typically comes in the form of parallel
> text—sentences translated between the two languages of interest. Limited
> quantities of parallel text are available for most language pairs, leading
> to a low-resource problem. Even when training data is available in the
> desired language pair, it is frequently formal text—leading to a domain
> mismatch when models are used to translate a different type of data, such
> as social media or medical text. Neural machine translation currently
> performs poorly in low-resource and domain mismatch settings; my work aims
> to overcome these limitations, and make machine translation a useful tool
> for all users.In this talk, I will discuss a method for improving
> translation in low resource settings—Simulated Multiple Reference Training
> (SMRT; Khayrallah et al., 2020)—which uses a paraphraser to simulate
> training on all possible translations per sentence. I will also discuss
> work on improving domain adaptation (Khayrallah et al., 2018), and work
> on analyzing the effect of noisy training data (Khayrallah and Koehn,
> 2018).
>
>
> *Bio: *Huda Khayrallah is a PhD candidate in Computer Science at The
> Johns Hopkins University where she is advised by Philipp Koehn. She is part
> of the Center for Language and Speech Processing and the machine
> translation group. She works on applied machine learning for natural
> language processing, primarily machine translation. Her work focuses on
> overcoming deep learning’s sensitivity to the quantity and quality of the
> training data, including low resource and domain adaptation settings. In
> Summer 2019, she was a research intern at Lilt, working on
> translator-in-the-loop machine translation. She holds an MSE in Computer
> Science from Johns Hopkins (2017), and a BA in Computer Science from UC
> Berkeley (2015). More information about her can be found on her website:
> http://www.cs.jhu.edu/~huda
>
> *Host:* Kevin Gimpel <kgimpel at ttic.edu> <kgimpel at ttic.edu>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL 60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20210127/20923c5d/attachment.html>
More information about the Theory
mailing list