<div dir="ltr"><div class="gmail_default"><div class="gmail_default"><font face="arial, sans-serif"><font style="color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)">    Wednesday, March 1</font><span class="gmail_default" style="color:rgb(0,0,0)">, 2023</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <u>11:30</u></b><b><u><font color="#000000"> a</font></u></b><b><u><font color="#000000">m CT</font></u><font color="#000000">   </font></b></font></font><br></font></div><div class="gmail_default"><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where:       </font></b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050">               </font><font color="#000000">    TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                   5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Virtually:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">   <i>via</i> Panopto </span>(<b><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb224c62-d7b4-47bf-98cf-afb2010b249c" target="_blank">livestream</a></b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">)</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><br></span></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050">    </font><font color="#000000"><font color="#500050">   </font></font></font></font><span style="font-family:arial,sans-serif;color:rgb(34,34,34)">Ofir Press, University of Washington</span><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><br></font></p><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;color:rgb(80,0,80);text-align:center;line-height:15.6933px"><hr size="2" width="100%" align="center"></div><div><font face="arial, sans-serif"><b>Title:   </b>     Guidance Helps Where Scale Doesn't in Language Modeling<br></font></div><div><font face="arial, sans-serif"><b>                                                                                                                                                                                                                                                                    Abstract:</b> <font color="#000000">Language models (LMs) are at the

core of almost all state of the art natural language processing systems on

almost every benchmark. Recent papers, such as Brown et al. 2020 and Hoffmann

et al. 2022 have shown that scaling up the size of these models leads to better

results. But is scaling all we need in order to improve language models?</font><br></font></div><div>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In this talk I argue that the

answer is no, by presenting three studies that show properties of LMs that are

not improved with scale. In addition, I will show how to tackle these issues

without actually increasing the size on disk, memory usage, or runtime of the

LM. In each case, I accomplish it by adding a new kind of guidance to the

model.  </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">In Press & Wolf 2017 we showed

that the decoding mechanism in LMs contains word representations, and that in

models of different sizes, the decoder word representations are of lower

quality than the ones in the encoder. We then show that by using the same

representations twice (in both the encoder and the decoder) we improve LM

performance while decreasing its size. </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Memory constraints imply that LMs

have to be trained on limited segments of text. For example, GPT-3 (Brown et

al. 2020) was trained on text segments that are 4,096 tokens long. Can these

models summarize text sequences that are longer than the ones they observed at

training? Can they make code predictions for code files that are longer than

the ones they were shown during  training? In Press et al. 2021 we show

that existing LMs cannot process text segments that are longer than the ones

they were trained on. We present a new method (ALiBi) that allows LMs to

efficiently consume sequences that are longer than the ones they observed at

training. ALiBi achieves this by guiding the LM to pay less attention to words

that are further away. </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"> </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">Finally, in Press et al. 2022 we

show that LMs are able to reason over facts observed during training to answer

novel questions that they have never previously seen. But in about 40% of

cases, they are not able to accomplish basic reasoning over facts that they are

able to recall, and this does not improve with scale. We show that by adding

guidance to the way we prompt LMs, by having them ask and answer sub-questions

before answering the main complex question,  we are able to substantially

improve their reasoning capabilities. </font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif"><br>

</font></p>


<p class="MsoNormal" style="margin:0in;line-height:normal"><font color="#000000" face="arial, sans-serif">These methods have been integrated

in many state-of-the-art language and translation models, including OpenAI's

GPT, Google's BERT, BigScience's BLOOM and Microsoft's, Meta's and Amazon's

translation models. </font></p></div><div><br></div><div><p class="MsoNormal" style="line-height:150%;margin:0in 0in 8pt"><font face="arial, sans-serif"><b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">Bio: </span></b><span style="color:black;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">I am a PhD candidate (ABD) at the Paul G. Allen School for Computer

Science & Engineering at the University of Washington, where I am very

fortunate to be advised by Noah Smith. During my PhD I spent two years as a

visiting researcher at Facebook AI Research Labs on Luke Zettlemoyer's team

where I mainly worked with Mike Lewis. Prior to that, in the summer of 2019 I

interned at Facebook AI Research with Omer Levy. Towards the end of my PhD I

spent half a year as a visiting researcher at MosaicML on Jonathan Frankle's

team. Before starting my PhD I completed my Bachelor's and Master's degrees in

Computer Science at Tel Aviv University (where I was advised by Lior Wolf and

also worked with Jonathan Berant). Between my Bachelor's and Master's degrees I

was a software developer for a year.</span></font></p><div><div><font color="#000000" face="arial, sans-serif"><b><br></b></font></div><div><font color="#000000" face="arial, sans-serif"><b>Host:</b> <a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b><br></a></font></div></div></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div></div></div><div><div dir="ltr" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>