<div dir="ltr"><div class="gmail_default" style="font-family:georgia,serif;font-size:small"><div class="gmail_default"><div class="gmail_default"><div class="gmail_default"><font color="#000000" face="georgia, serif"><span style="letter-spacing:0.2px"><b>When:    </b>Wednesday, November 20th<b> at </b></span><b style="letter-spacing:0.2px"><span style="background-color:rgb(255,255,0)">11AM CT</span></b></font></div><div class="gmail_default"><b style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></b></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Where:   </b>Talk will be given<span style="background-color:rgb(255,255,0)"> </span><span style="background-color:rgb(255,255,0)"><font style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font></span>at</font></div><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font color="#000000" face="georgia, serif">                    TTIC, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font color="#000000" face="georgia, serif">                    5th Floor, Room 530<b>  </b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></b></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Virtually: </b><span style="letter-spacing:0.2px">via Panopto (</span><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb670e00-20e7-46a2-a6a4-b20b00f9f619" target="_blank" style="letter-spacing:0.2px">Livestream</a><span style="letter-spacing:0.2px">)</span><br></font></p><div class="gmail_default"><b style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></b></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><span style="letter-spacing:0.2px"><b>Who:      </b></span></font>Peter Hase, University of North Carolina at Chapel Hill</div><div class="gmail_default"><span style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></span></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Title:</b>       </font>AI Safety Through Interpretable and Controllable Language Models</div><div class="gmail_default"><font face="georgia, serif"><br><b style="color:rgb(0,0,0);letter-spacing:0.2px">Abstract: </b></font>In a 2022 survey, 37% of NLP experts agreed that "AI decisions could cause nuclear-level catastrophe'' in this century. This survey was conducted prior to the release of ChatGPT. The research community’s now-common concern about catastrophic risks from AI highlights that long-standing problems in AI safety are as important as ever. In this talk, I will describe research on two core problems at the intersection of NLP and AI safety: (1) interpretability and (2) controllability. We need interpretability methods to verify that models use acceptable and generalizable reasoning to solve tasks. Controllability refers to our ability to steer individual behaviors in models on demand, which is helpful since pretrained models will need continual adjustment of specific knowledge and beliefs about the world. This talk will cover recent work on (1) open problems in interpretability, including mechanistic interpretability and chain-of-thought faithfulness, (2) fundamental problems with model editing, viewed through the lens of belief revision, and (3) scalable oversight, with a focus on weak-to-strong generalization. Together, these lines of research aim to develop rigorous technical foundations for ensuring the safety of increasingly capable AI systems.</div><div class="gmail_default"><br></div><div><div class="gmail_default"><font face="georgia, serif" color="#000000"><b>Bio:</b> </font>Peter Hase is an AI Resident at Anthropic, working on the Alignment Science team. He recently completed his PhD at the University of North Carolina at Chapel Hill. His research focuses on NLP and AI Safety, with a particular emphasis on techniques for explaining and controlling model behavior. He has previously worked at AI2, Google, and Meta.</div><div><font color="#000000" face="georgia, serif"><br></font></div></div></div><div><div class="gmail_default"><b><font face="georgia, serif" color="#000000">Host: <a href="mailto:klivescu@ttic.edu" target="_blank">Karen Livescu</a></font></b></div></div></div><font color="#888888"><div><br style="font-family:Arial,Helvetica,sans-serif"></div></font></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><b style="background-color:rgb(255,255,255)"><font color="#3d85c6">Brandie Jones </font></b><div><div><div><font color="#3d85c6"><b><i>Executive </i></b></font><b style="color:rgb(61,133,198)"><i>Administrative Assistant</i></b></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Toyota Technological Institute</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">6045 S. Kenwood Avenue</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Chicago, IL  60637</font></span></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><a href="http://www.ttic.edu" target="_blank">www.ttic.edu</a> </font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><br></font></span></div></div></div></div>