<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div class="gmail_default"><font face="arial, sans-serif"><font style="color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)"> Wednesday, March <span class="gmail-il">1</span></font><span class="gmail_default" style="color:rgb(0,0,0)">, 2023</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <u>11:30</u></b><b><u><font color="#000000"> a</font></u></b><b><u><font color="#000000">m CT</font></u><font color="#000000"> </font></b></font></font><br></font></div><div class="gmail_default"><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where: </font></b><font color="#000000" style="font-family:arial,sans-serif"><span class="gmail-il">Talk</span> will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Virtually:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"> <i>via</i> Panopto </span>(<b><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb224c62-d7b4-47bf-98cf-afb2010b249c" target="_blank">livestream</a></b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">)</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><br></span></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font><span style="font-family:arial,sans-serif;color:rgb(34,34,34)"><span class="gmail-il">Ofir</span> Press, University of Washington</span><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><br></font></p><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;color:rgb(80,0,80);text-align:center;line-height:15.6933px"><hr size="2" width="100%" align="center"></div><div><font face="arial, sans-serif"><b>Title: </b> Guidance Helps Where Scale Doesn't in Language Modeling</font></div><div><b style="font-family:arial,sans-serif"><br></b></div><div><b style="font-family:arial,sans-serif">Abstract:</b><span style="font-family:arial,sans-serif"> </span><font color="#000000" style="font-family:arial,sans-serif">Language models (LMs) are at the core of almost all state of the art natural language processing systems on almost every benchmark. Recent papers, such as Brown et al. 2020 and Hoffmann et al. 2022 have shown that scaling up the size of these models leads to better results. But is scaling all we need in order to improve language models?</font></div><div><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In this <span class="gmail-il">talk</span> I argue that the answer is no, by presenting three studies that show properties of LMs that are not improved with scale. In addition, I will show how to tackle these issues without actually increasing the size on disk, memory usage, or runtime of the LM. In each case, I accomplish it by adding a new kind of guidance to the model. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In Press & Wolf 2017 we showed that the decoding mechanism in LMs contains word representations, and that in models of different sizes, the decoder word representations are of lower quality than the ones in the encoder. We then show that by using the same representations twice (in both the encoder and the decoder) we improve LM performance while decreasing its size. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Memory constraints imply that LMs have to be trained on limited segments of text. For example, GPT-<span class="gmail-il">3</span> (Brown et al. 2020) was trained on text segments that are 4,096 tokens long. Can these models summarize text sequences that are longer than the ones they observed at training? Can they make code predictions for code files that are longer than the ones they were shown during training? In Press et al. 2021 we show that existing LMs cannot process text segments that are longer than the ones they were trained on. We present a new method (ALiBi) that allows LMs to efficiently consume sequences that are longer than the ones they observed at training. ALiBi achieves this by guiding the LM to pay less attention to words that are further away. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Finally, in Press et al. 2022 we show that LMs are able to reason over facts observed during training to answer novel questions that they have never previously seen. But in about 40% of cases, they are not able to accomplish basic reasoning over facts that they are able to recall, and this does not improve with scale. We show that by adding guidance to the way we prompt LMs, by having them ask and answer sub-questions before answering the main complex question, we are able to substantially improve their reasoning capabilities. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"><br></font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">These methods have been integrated in many state-of-the-art language and translation models, including OpenAI's GPT, Google's BERT, BigScience's BLOOM and Microsoft's, Meta's and Amazon's translation models. </font></p></div><div><br></div><div><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:19.5px"><font face="arial, sans-serif"><b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">Bio: </span></b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">I am a PhD candidate (ABD) at the Paul G. Allen School for Computer Science & Engineering at the University of Washington, where I am very fortunate to be advised by Noah Smith. During my PhD I spent two years as a visiting researcher at Facebook AI Research Labs on Luke Zettlemoyer's team where I mainly worked with Mike Lewis. Prior to that, in the summer of 2019 I interned at Facebook AI Research with Omer Levy. Towards the end of my PhD I spent half a year as a visiting researcher at MosaicML on Jonathan Frankle's team. Before starting my PhD I completed my Bachelor's and Master's degrees in Computer Science at Tel Aviv University (where I was advised by Lior Wolf and also worked with Jonathan Berant). Between my Bachelor's and Master's degrees I was a software developer for a year.</span></font></p><div><div><font color="#000000" face="arial, sans-serif"><b><br></b></font></div><div><font color="#000000" face="arial, sans-serif"><b>Host:</b> <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b><br></a></font></div></div></div><div><br></div><div><br></div></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 28, 2023 at 4:03 PM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div style="font-size:small"><div><font face="arial, sans-serif"><font style="color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)"> Wednesday, March 1</font><span class="gmail_default" style="color:rgb(0,0,0)">, 2023</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <u>11:30</u></b><b><u><font color="#000000"> a</font></u></b><b><u><font color="#000000">m CT</font></u><font color="#000000"> </font></b></font></font><br></font></div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where: </font></b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Virtually:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"> <i>via</i> Panopto </span>(<b><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb224c62-d7b4-47bf-98cf-afb2010b249c" target="_blank">livestream</a></b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">)</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><br></span></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font><span style="font-family:arial,sans-serif;color:rgb(34,34,34)"><span>Ofir</span> Press, University of Washington</span><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><br></font></p><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;color:rgb(80,0,80);text-align:center;line-height:15.6933px"><hr size="2" width="100%" align="center"></div><div><font face="arial, sans-serif"><b>Title: </b> Guidance Helps Where Scale Doesn't in Language Modeling</font></div><div><b style="font-family:arial,sans-serif"><br></b></div><div><b style="font-family:arial,sans-serif">Abstract:</b><span style="font-family:arial,sans-serif"> </span><font color="#000000" style="font-family:arial,sans-serif">Language models (LMs) are at the core of almost all state of the art natural language processing systems on almost every benchmark. Recent papers, such as Brown et al. 2020 and Hoffmann et al. 2022 have shown that scaling up the size of these models leads to better results. But is scaling all we need in order to improve language models?</font></div><div><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In this talk I argue that the answer is no, by presenting three studies that show properties of LMs that are not improved with scale. In addition, I will show how to tackle these issues without actually increasing the size on disk, memory usage, or runtime of the LM. In each case, I accomplish it by adding a new kind of guidance to the model. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In Press & Wolf 2017 we showed that the decoding mechanism in LMs contains word representations, and that in models of different sizes, the decoder word representations are of lower quality than the ones in the encoder. We then show that by using the same representations twice (in both the encoder and the decoder) we improve LM performance while decreasing its size. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Memory constraints imply that LMs have to be trained on limited segments of text. For example, GPT-3 (Brown et al. 2020) was trained on text segments that are 4,096 tokens long. Can these models summarize text sequences that are longer than the ones they observed at training? Can they make code predictions for code files that are longer than the ones they were shown during training? In Press et al. 2021 we show that existing LMs cannot process text segments that are longer than the ones they were trained on. We present a new method (ALiBi) that allows LMs to efficiently consume sequences that are longer than the ones they observed at training. ALiBi achieves this by guiding the LM to pay less attention to words that are further away. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Finally, in Press et al. 2022 we show that LMs are able to reason over facts observed during training to answer novel questions that they have never previously seen. But in about 40% of cases, they are not able to accomplish basic reasoning over facts that they are able to recall, and this does not improve with scale. We show that by adding guidance to the way we prompt LMs, by having them ask and answer sub-questions before answering the main complex question, we are able to substantially improve their reasoning capabilities. </font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"><br></font></p><p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">These methods have been integrated in many state-of-the-art language and translation models, including OpenAI's GPT, Google's BERT, BigScience's BLOOM and Microsoft's, Meta's and Amazon's translation models. </font></p></div><div><br></div><div><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:19.5px"><font face="arial, sans-serif"><b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">Bio: </span></b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">I am a PhD candidate (ABD) at the Paul G. Allen School for Computer Science & Engineering at the University of Washington, where I am very fortunate to be advised by Noah Smith. During my PhD I spent two years as a visiting researcher at Facebook AI Research Labs on Luke Zettlemoyer's team where I mainly worked with Mike Lewis. Prior to that, in the summer of 2019 I interned at Facebook AI Research with Omer Levy. Towards the end of my PhD I spent half a year as a visiting researcher at MosaicML on Jonathan Frankle's team. Before starting my PhD I completed my Bachelor's and Master's degrees in Computer Science at Tel Aviv University (where I was advised by Lior Wolf and also worked with Jonathan Berant). Between my Bachelor's and Master's degrees I was a software developer for a year.</span></font></p><div><div><font color="#000000" face="arial, sans-serif"><b><br></b></font></div><div><font color="#000000" face="arial, sans-serif"><b>Host:</b> <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b><br></a></font></div></div></div><div><br></div><div><br></div><div><br></div></div></div></div><div dir="ltr"><div><div style="font-size:small"><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div></div><div><div dir="ltr"><div dir="ltr"><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 23, 2023 at 10:40 AM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><font face="arial, sans-serif"><font style="color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)"> Wednesday, March 1</font><span class="gmail_default" style="color:rgb(0,0,0)">, 2023</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <u>11:30</u></b><b><u><font color="#000000"> a</font></u></b><b><u><font color="#000000">m CT</font></u><font color="#000000"> </font></b></font></font><br></font></div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where: </font></b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050"> </font><font color="#000000"> TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"> 5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Virtually:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"> <i>via</i> Panopto </span>(<b><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb224c62-d7b4-47bf-98cf-afb2010b249c" target="_blank">livestream</a></b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">)</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><br></span></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050"> </font><font color="#000000"><font color="#500050"> </font></font></font></font><span style="font-family:arial,sans-serif;color:rgb(34,34,34)">Ofir Press, University of Washington</span><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><br></font></p><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;color:rgb(80,0,80);text-align:center;line-height:15.6933px"><hr size="2" width="100%" align="center"></div><div><font face="arial, sans-serif"><b>Title: </b> Guidance Helps Where Scale Doesn't in Language Modeling<br></font></div><div><font face="arial, sans-serif"><b> Abstract:</b> <font color="#000000">Language models (LMs) are at the
core of almost all state of the art natural language processing systems on
almost every benchmark. Recent papers, such as Brown et al. 2020 and Hoffmann
et al. 2022 have shown that scaling up the size of these models leads to better
results. But is scaling all we need in order to improve language models?</font><br></font></div><div>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In this talk I argue that the
answer is no, by presenting three studies that show properties of LMs that are
not improved with scale. In addition, I will show how to tackle these issues
without actually increasing the size on disk, memory usage, or runtime of the
LM. In each case, I accomplish it by adding a new kind of guidance to the
model. </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In Press & Wolf 2017 we showed
that the decoding mechanism in LMs contains word representations, and that in
models of different sizes, the decoder word representations are of lower
quality than the ones in the encoder. We then show that by using the same
representations twice (in both the encoder and the decoder) we improve LM
performance while decreasing its size. </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Memory constraints imply that LMs
have to be trained on limited segments of text. For example, GPT-3 (Brown et
al. 2020) was trained on text segments that are 4,096 tokens long. Can these
models summarize text sequences that are longer than the ones they observed at
training? Can they make code predictions for code files that are longer than
the ones they were shown during training? In Press et al. 2021 we show
that existing LMs cannot process text segments that are longer than the ones
they were trained on. We present a new method (ALiBi) that allows LMs to
efficiently consume sequences that are longer than the ones they observed at
training. ALiBi achieves this by guiding the LM to pay less attention to words
that are further away. </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Finally, in Press et al. 2022 we
show that LMs are able to reason over facts observed during training to answer
novel questions that they have never previously seen. But in about 40% of
cases, they are not able to accomplish basic reasoning over facts that they are
able to recall, and this does not improve with scale. We show that by adding
guidance to the way we prompt LMs, by having them ask and answer sub-questions
before answering the main complex question, we are able to substantially
improve their reasoning capabilities. </font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"><br>
</font></p>
<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">These methods have been integrated
in many state-of-the-art language and translation models, including OpenAI's
GPT, Google's BERT, BigScience's BLOOM and Microsoft's, Meta's and Amazon's
translation models. </font></p></div><div><br></div><div><p class="MsoNormal" style="line-height:150%;margin:0in 0in 8pt"><font face="arial, sans-serif"><b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">Bio: </span></b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">I am a PhD candidate (ABD) at the Paul G. Allen School for Computer
Science & Engineering at the University of Washington, where I am very
fortunate to be advised by Noah Smith. During my PhD I spent two years as a
visiting researcher at Facebook AI Research Labs on Luke Zettlemoyer's team
where I mainly worked with Mike Lewis. Prior to that, in the summer of 2019 I
interned at Facebook AI Research with Omer Levy. Towards the end of my PhD I
spent half a year as a visiting researcher at MosaicML on Jonathan Frankle's
team. Before starting my PhD I completed my Bachelor's and Master's degrees in
Computer Science at Tel Aviv University (where I was advised by Lior Wolf and
also worked with Jonathan Berant). Between my Bachelor's and Master's degrees I
was a software developer for a year.</span></font></p><div><div><font color="#000000" face="arial, sans-serif"><b><br></b></font></div><div><font color="#000000" face="arial, sans-serif"><b>Host:</b> <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b><br></a></font></div></div></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>
</blockquote></div>
</div>
</blockquote></div></div>