[Colloquium] 1/28 Talks at TTIC: Huda Khayrallah, Johns Hopkins University

Thu Jan 21 21:08:52 CST 2021

*When:*      Thursday, January 28th at* 11:10 am CT*

*Where:*     Zoom Virtual Talk (*register in advance here
<https://uchicagogroup.zoom.us/webinar/register/WN_LXDxvgkcQJaDvB9ptbv2cQ>*)

*Who: *       Huda Khayrallah, Johns Hopkins University

*Title: *Machine Translation for All: Improving Machine Translation in Low
Resource, Domain Mismatch, and Low Resource Settings

*Abstract:* Machine translation uses machine learning to automatically
translate text from one language to another and has the potential to reduce
language barriers. Recent improvements in machine translation have made it
more widely-usable, partly due to deep neural network approaches.
However—like most deep learning algorithms—neural machine translation is
sensitive to the quantity and quality of training data, and therefore
produces poor translations for some languages and styles of text. Machine
translation training data typically comes in the form of parallel
text—sentences translated between the two languages of interest. Limited
quantities of parallel text are available for most language pairs, leading
to a low-resource problem.  Even when training data is available in the
desired language pair, it is frequently formal text—leading to a domain
mismatch when models are used to translate a different type of data, such
as social media or medical text.  Neural machine translation currently
performs poorly in low-resource and domain mismatch settings; my work aims
to overcome these limitations, and make machine translation a useful tool
for all users.In this talk, I will discuss a method for improving
translation in low resource settings—Simulated Multiple Reference Training
(SMRT; Khayrallah et al., 2020)—which uses a paraphraser to simulate
training on all possible translations per sentence. I will also discuss
work on improving domain adaptation (Khayrallah et al., 2018), and work on
analyzing the effect of noisy training data (Khayrallah and Koehn, 2018).

*Bio: *Huda Khayrallah is a PhD candidate in Computer Science at The Johns
Hopkins University where she is advised by Philipp Koehn. She is part of
the Center for Language and Speech Processing and the machine translation
group. She works on applied machine learning for natural language
processing, primarily machine translation. Her work focuses on overcoming
deep learning’s sensitivity to the quantity and quality of the training
data, including low resource and domain adaptation settings. In Summer
2019, she was a research intern at Lilt, working on translator-in-the-loop
machine translation. She holds an MSE in Computer Science from Johns
Hopkins (2017), and a BA in Computer Science from UC Berkeley (2015). More
information about her can be found on her website: http://www.cs.jhu.edu/~
huda

*Host:* Kevin Gimpel <kgimpel at ttic.edu>  <kgimpel at ttic.edu>

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20210121/4ca605e9/attachment.html>