<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div class="gmail_default"><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"> Thursday, January 28th at<b> 11:10 am CT</b></font></font><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Where:</b> </font></font><font color="#000000">Zoom Virtual Talk (</font><b><font color="#0000ff"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_LXDxvgkcQJaDvB9ptbv2cQ" target="_blank">register in advance here</a></font></b><font color="#000000">)</font></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> </font></font>Huda Khayrallah, Johns Hopkins University</font></p><font face="arial, sans-serif"><br></font></div><div class="gmail_default"><div style="color:rgb(0,0,0)"><p style="margin-top:0px;margin-bottom:0px"><font face="arial, sans-serif"><b>Title: </b><span style="margin:0px;font-variant-ligatures:common-ligatures">Machine Translation for All: Improving Machine Translation in Low Resource, Domain Mismatch, and Low Resource Settings</span></font></p><p style="margin-top:0px;margin-bottom:0px"><span style="margin:0px;font-variant-ligatures:common-ligatures"><font face="arial, sans-serif"><br></font></span></p><p style="margin-top:0px;margin-bottom:0px"><font face="arial, sans-serif"><span style="margin:0px;font-variant-ligatures:common-ligatures"><b>Abstract:</b> </span><span style="margin:0px;font-variant-ligatures:common-ligatures"><span style="margin:0px">Machine translation uses machine learning to automatically translate text from one language to another and has the potential to reduce language barriers. Recent improvements in machine translation have made it more widely-usable, partly due to deep neural network approaches. However—like most deep learning algorithms—neural machine translation is sensitive to the quantity and quality of training data, and therefore produces poor translations for some languages and styles of text. Machine translation training data typically comes in the form of parallel text—sentences translated between the two languages of interest. Limited quantities of parallel text are available for most language pairs, leading to a low-resource problem. Even when training data is available in the desired language pair, it is frequently formal text—leading to a domain mismatch when models are used to translate a different type of data, such as social media or medical text. Neural machine translation currently performs poorly in low-resource and domain mismatch settings; my work aims to overcome these limitations, and make machine translation a useful tool for all users.</span><span style="margin:0px;display:block;height:8px"></span><span style="margin:0px">In this talk, I will discuss a method for improving translation in low resource settings—Simulated Multiple Reference Training (SMRT; <span style="margin:0px">Khayrallah</span> et al., 2020)—which uses a paraphraser to simulate training on all possible translations per sentence. I will also discuss work on improving domain adaptation (<span style="margin:0px">Khayrallah</span> et al., 2018), and work on analyzing the effect of noisy training data (<span style="margin:0px">Khayrallah</span> and Koehn, 2018).</span></span></font></p><p style="margin-top:0px;margin-bottom:0px"><span style="margin:0px;font-variant-ligatures:common-ligatures"><span style="margin:0px"><font face="arial, sans-serif"><br></font></span></span></p><div style="margin:0px"><div style="margin:0px"><div dir="auto" style="margin:0px;line-height:1.46668"><div style="margin:0px"><p style="margin-top:0px;margin-bottom:0px"><font face="arial, sans-serif"><b>Bio: </b><span style="margin:0px;font-variant-ligatures:common-ligatures"><span style="margin:0px">Huda</span> <span style="margin:0px">Khayrallah</span> is a PhD candidate in Computer Science at The Johns Hopkins University where she is advised by Philipp Koehn. She is part of the Center for Language and Speech Processing and the machine translation group. She works on applied machine learning for natural language processing, primarily machine translation. Her work focuses on overcoming deep learning’s sensitivity to the quantity and quality of the training data, including low resource and domain adaptation settings. In Summer 2019, she was a research intern at Lilt, working on translator-in-the-loop machine translation. She holds an MSE in Computer Science from Johns Hopkins (2017), and a BA in Computer Science from UC Berkeley (2015). More information about her can be found on her website:<span style="margin:0px"> </span></span><a href="http://www.cs.jhu.edu/~huda" rel="noopener noreferrer" target="_blank" style="margin:0px;font-variant-ligatures:common-ligatures">http://www.cs.jhu.edu/~<span style="margin:0px">huda</span></a></font></p></div></div></div></div><font face="arial, sans-serif"><br></font></div><div id="gmail-m_-6445024901113733594gmail-m_7884601325777251486appendonsend"></div></div><div class="gmail_default"><font face="arial, sans-serif"><b style="white-space:pre-wrap">Host:</b><span style="white-space:pre-wrap"> <a href="mailto:kgimpel@ttic.edu" target="_blank">Kevin Gimpel</a></span><a href="mailto:kgimpel@ttic.edu" target="_blank"> </a></font></div><div class="gmail_default"><br></div><div class="gmail_default"><br></div><div class="gmail_default"><br></div></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><font face="arial, helvetica, sans-serif">Mary C. Marre</font><div><font face="arial, helvetica, sans-serif">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">6045 S. Kenwood Avenue</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Room 517</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i></div><div><i><font face="arial, helvetica, sans-serif">p:(773) 834-1757</font></i></div><div><i><font face="arial, helvetica, sans-serif">f: (773) 357-6970</font></i></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 21, 2021 at 9:08 PM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"> Thursday, January 28th at<b> 11:10 am CT</b></font></font><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Where:</b> </font></font><font color="#000000">Zoom Virtual Talk (</font><b><font color="#0000ff"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_LXDxvgkcQJaDvB9ptbv2cQ" target="_blank">register in advance here</a></font></b><font color="#000000">)</font></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> </font></font>Huda Khayrallah, Johns Hopkins University</font></p><font face="arial, sans-serif"><br></font></div><div><div style="color:rgb(0,0,0)"><p style="margin-top:0px;margin-bottom:0px"><font face="arial, sans-serif"><b>Title: </b><span style="margin:0px;font-variant-ligatures:common-ligatures">Machine Translation for All: Improving Machine Translation in Low Resource, Domain Mismatch, and Low Resource Settings</span></font></p><p style="margin-top:0px;margin-bottom:0px"><span style="margin:0px;font-variant-ligatures:common-ligatures"><font face="arial, sans-serif"><br></font></span></p><p style="margin-top:0px;margin-bottom:0px"><font face="arial, sans-serif"><span style="margin:0px;font-variant-ligatures:common-ligatures"><b>Abstract:</b> </span><span style="margin:0px;font-variant-ligatures:common-ligatures"><span style="margin:0px">Machine translation uses machine learning to automatically translate text from one language to another and has the potential to reduce language barriers. Recent improvements in machine translation have made it more widely-usable, partly due to deep neural network approaches. However—like most deep learning algorithms—neural machine translation is sensitive to the quantity and quality of training data, and therefore produces poor translations for some languages and styles of text. Machine translation training data typically comes in the form of parallel text—sentences translated between the two languages of interest. Limited quantities of parallel text are available for most language pairs, leading to a low-resource problem. Even when training data is available in the desired language pair, it is frequently formal text—leading to a domain mismatch when models are used to translate a different type of data, such as social media or medical text. Neural machine translation currently performs poorly in low-resource and domain mismatch settings; my work aims to overcome these limitations, and make machine translation a useful tool for all users.</span><span style="margin:0px;display:block;height:8px"></span><span style="margin:0px">In this talk, I will discuss a method for improving translation in low resource settings—Simulated Multiple Reference Training (SMRT; <span style="margin:0px">Khayrallah</span> et al., 2020)—which uses a paraphraser to simulate training on all possible translations per sentence. I will also discuss work on improving domain adaptation (<span style="margin:0px">Khayrallah</span> et al., 2018), and work on analyzing the effect of noisy training data (<span style="margin:0px">Khayrallah</span> and Koehn, 2018).</span></span></font></p><p style="margin-top:0px;margin-bottom:0px"><span style="margin:0px;font-variant-ligatures:common-ligatures"><span style="margin:0px"><font face="arial, sans-serif"><br></font></span></span></p><div style="margin:0px"><div style="margin:0px"><div dir="auto" style="margin:0px;line-height:1.46668"><div style="margin:0px"><p style="margin-top:0px;margin-bottom:0px"><font face="arial, sans-serif"><b>Bio: </b><span style="margin:0px;font-variant-ligatures:common-ligatures"><span style="margin:0px">Huda</span> <span style="margin:0px">Khayrallah</span> is a PhD candidate in Computer Science at The Johns Hopkins University where she is advised by Philipp Koehn. She is part of the Center for Language and Speech Processing and the machine translation group. She works on applied machine learning for natural language processing, primarily machine translation. Her work focuses on overcoming deep learning’s sensitivity to the quantity and quality of the training data, including low resource and domain adaptation settings. In Summer 2019, she was a research intern at Lilt, working on translator-in-the-loop machine translation. She holds an MSE in Computer Science from Johns Hopkins (2017), and a BA in Computer Science from UC Berkeley (2015). More information about her can be found on her website:<span style="margin:0px"> </span></span><a href="http://www.cs.jhu.edu/~huda" rel="noopener noreferrer" style="margin:0px;font-variant-ligatures:common-ligatures" target="_blank">http://www.cs.jhu.edu/~<span style="margin:0px">huda</span></a></font></p></div></div></div></div><font face="arial, sans-serif"><br></font></div><div id="gmail-m_-6445024901113733594gmail-m_7884601325777251486appendonsend"></div></div><div><font face="arial, sans-serif"><b style="white-space:pre-wrap">Host:</b><span style="white-space:pre-wrap"> <a href="mailto:kgimpel@ttic.edu" target="_blank">Kevin Gimpel</a></span><a href="mailto:kgimpel@ttic.edu" target="_blank"> </a><br></font></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div><div><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><font face="arial, helvetica, sans-serif">Mary C. Marre</font><div><font face="arial, helvetica, sans-serif">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">6045 S. Kenwood Avenue</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Room 517</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i></div><div><i><font face="arial, helvetica, sans-serif">p:(773) 834-1757</font></i></div><div><i><font face="arial, helvetica, sans-serif">f: (773) 357-6970</font></i></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div></div>