<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div class="gmail_default"><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif" color="#0000ff"><b style="background-color:rgb(255,255,0)">Note: Talk is LIVE and Virtual</b></font></p><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b>When:        </b>Friday, February 4th at <b style="background-color:rgb(255,255,0)">10:30am CT</b></font></p><div class="gmail_default"><b>Where: </b><b>      </b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given</font><font color="#500050" style="font-family:arial,sans-serif"> </font><font color="#0000ff" face="arial, sans-serif" style="font-weight:bold"><u>live, in-person</u></font><font color="#0000ff" face="verdana, sans-serif" style="font-weight:bold"> </font><font face="arial, sans-serif" color="#000000">at</font></div><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">TTIC, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" style="background-color:rgb(255,255,0)"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Where:</span></b><span style="color:black">       Zoom Virtual Talk (</span><b><span style="color:blue"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_1KBwOci8S62CnOJ9hVGMKg" target="_blank" style="color:rgb(5,99,193)"><span style="color:rgb(17,85,204)">register in advance here</span></a></span></b><span style="color:black">)</span></font></p><p class="MsoNormal" style="margin:0in 0in 8pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Who: </span></b><span style="color:black"> </span><span style="color:rgb(80,0,80)">        </span>Rowan Zellers, University of Washington</font></p></div><div class="gmail_default"><div class="gmail_default"><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Title:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">         Grounding Language by Seeing, Hearing, and Interacting</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><font face="arial, sans-serif"><br></font></span></p><div><div><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Abstract:</b> As humans, our understanding of language is grounded in a rich mental model about “how the world works” – that we learn through perception and interaction. We use this understanding to reason beyond what is literally said, imagining how situations might unfold in the world. Machines today struggle at making such connections, which limits how they can be safely used.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">In my talk, I will discuss three lines of work to bridge this gap between machines and humans. I will first discuss how we might measure grounded understanding. I will introduce a suite of approaches for constructing benchmarks, using machines in the loop to filter out spurious biases. Next, I will introduce PIGLeT: a model that learns physical commonsense understanding by interacting with the world through simulation, using this knowledge to ground language. PIGLeT learns linguistic form and meaning – together – and outperforms text-to-text only models that are orders of magnitude larger. Finally, I will introduce MERLOT, which learns about situations in the world by watching millions of YouTube videos with transcribed speech. The model learns to jointly represent video, audio, and language, together and over time – learning multimodal and neural script knowledge representations.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">Together, these directions suggest a path forward for building machines that learn language rooted in the world.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Bio:</b> Rowan Zellers is a final year PhD candidate at the University of Washington in Computer Science & Engineering, advised by Yejin Choi and Ali Farhadi. His research focuses on enabling machines to understand language, vision, sound, and the world beyond these modalities. He has been recognized through NSF Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has appeared in several media outlets, including Wired, the Washington Post, and the New York Times. In the past, he graduated from Harvey Mudd College with a B.S. in Computer Science & Mathematics, and has interned at the Allen Institute for AI.</font></p></div><div><font face="arial, sans-serif"><br></font></div></div></div><div class="gmail_default"><font face="arial, sans-serif"><span style="color:rgb(17,17,17)"><b>Host</b>: </span><a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a></font></div></div></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 4, 2022 at 9:30 AM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif" color="#0000ff"><b style="background-color:rgb(255,255,0)">Note: Talk is LIVE and Virtual</b></font></p><p class="MsoNormal" style="font-size:small;margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b>When:        </b>Friday, February 4th at <b style="background-color:rgb(255,255,0)">10:30am CT</b></font></p><div style="font-size:small"><b>Where: </b><b>      </b><font color="#000000" style="font-family:arial,sans-serif"><span style="background-color:rgb(255,255,255)">Talk will be given</span></font><span style="background-color:rgb(255,255,255)"><font color="#500050" style="font-family:arial,sans-serif"> </font><font color="#0000ff" face="arial, sans-serif" style="font-weight:bold"><u>live, <span>in</span>-<span>person</span></u></font></span><font color="#0000ff" face="verdana, sans-serif" style="font-weight:bold"> </font><font face="arial, sans-serif" color="#000000">at</font></div><p class="MsoNormal" style="font-size:small;margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">TTIC, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="font-size:small;margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="font-size:small;margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" style="background-color:rgb(255,255,0)"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="font-size:small;margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Where:</span></b><span style="color:black">       Zoom Virtual Talk (</span><b><span style="color:blue"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_1KBwOci8S62CnOJ9hVGMKg" style="color:rgb(5,99,193)" target="_blank"><span style="color:rgb(17,85,204)">register <span>in</span> advance here</span></a></span></b><span style="color:black">)</span></font></p><p class="MsoNormal" style="font-size:small;margin:0in 0in 8pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Who: </span></b><span style="color:black"> </span><span style="color:rgb(80,0,80)">        </span>Rowan Zellers, University of Washington</font></p></div><div style="font-size:small"><div><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Title:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">         Grounding Language by Seeing, Hearing, and Interacting</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><font face="arial, sans-serif"><br></font></span></p><div><div><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Abstract:</b> As humans, our understanding of language is grounded <span>in</span> a rich mental model about “how the world works” – that we learn through perception and interaction. We use this understanding to reason beyond what is literally said, imagining how situations might unfold <span>in</span> the world. Machines today struggle at making such connections, which limits how they can be safely used.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><span>In</span> my talk, I will discuss three lines of work to bridge this gap between machines and humans. I will first discuss how we might measure grounded understanding. I will introduce a suite of approaches for constructing benchmarks, using machines <span>in</span> the loop to filter out spurious biases. Next, I will introduce PIGLeT: a model that learns physical commonsense understanding by interacting with the world through simulation, using this knowledge to ground language. PIGLeT learns linguistic form and meaning – together – and outperforms text-to-text only models that are orders of magnitude larger. Finally, I will introduce MERLOT, which learns about situations <span>in</span> the world by watching millions of YouTube videos with transcribed speech. The model learns to jointly represent video, audio, and language, together and over time – learning multimodal and neural script knowledge representations.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">Together, these directions suggest a path forward for building machines that learn language rooted <span>in</span> the world.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Bio:</b> Rowan Zellers is a final year PhD candidate at the University of Washington <span>in</span> Computer Science & Engineering, advised by Yejin Choi and Ali Farhadi. His research focuses on enabling machines to understand language, vision, sound, and the world beyond these modalities. He has been recognized through NSF Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has appeared <span>in</span> several media outlets, including Wired, the Washington Post, and the New York Times. <span>In</span> the past, he graduated from Harvey Mudd College with a B.S. <span>in</span> Computer Science & Mathematics, and has interned at the Allen Institute for AI.</font></p></div><div><font face="arial, sans-serif"><br></font></div></div></div><div><font face="arial, sans-serif"><span style="color:rgb(17,17,17)"><b>Host</b>: </span><a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a></font></div><br></div><div style="font-size:small"><br></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 3, 2022 at 3:13 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div style="font-size:small"><div><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b>When:        </b>Friday, February 4th at <b style="background-color:rgb(255,255,0)">10:30am CT</b></font></p><div><b>Where: </b><b>      </b><span style="background-color:rgb(255,255,0)"><font color="#000000" style="font-family:arial,sans-serif">Talk will be given</font><font color="#500050" style="font-family:arial,sans-serif"> </font><font color="#0000ff" face="arial, sans-serif" style="font-weight:bold"><u>live, in-person</u></font></span><font color="#0000ff" face="verdana, sans-serif" style="font-weight:bold"> </font><font face="arial, sans-serif" color="#000000">at</font></div><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">TTIC, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" style="background-color:rgb(255,255,0)"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Where:</span></b><span style="color:black">       Zoom Virtual Talk (</span><b><span style="color:blue"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_1KBwOci8S62CnOJ9hVGMKg" style="color:rgb(5,99,193)" target="_blank"><span style="color:rgb(17,85,204)">register in advance here</span></a></span></b><span style="color:black">)</span></font></p><p class="MsoNormal" style="margin:0in 0in 8pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Who: </span></b><span style="color:black"> </span><span style="color:rgb(80,0,80)">        </span>Rowan Zellers, University of Washington</font></p></div><div><div><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Title:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">         Grounding Language by Seeing, Hearing, and Interacting</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><font face="arial, sans-serif"><br></font></span></p><div><div><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Abstract:</b> As humans, our understanding of language is grounded in a rich mental model about “how the world works” – that we learn through perception and interaction. We use this understanding to reason beyond what is literally said, imagining how situations might unfold in the world. Machines today struggle at making such connections, which limits how they can be safely used.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">In my talk, I will discuss three lines of work to bridge this gap between machines and humans. I will first discuss how we might measure grounded understanding. I will introduce a suite of approaches for constructing benchmarks, using machines in the loop to filter out spurious biases. Next, I will introduce PIGLeT: a model that learns physical commonsense understanding by interacting with the world through simulation, using this knowledge to ground language. PIGLeT learns linguistic form and meaning – together – and outperforms text-to-text only models that are orders of magnitude larger. Finally, I will introduce MERLOT, which learns about situations in the world by watching millions of YouTube videos with transcribed speech. The model learns to jointly represent video, audio, and language, together and over time – learning multimodal and neural script knowledge representations.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">Together, these directions suggest a path forward for building machines that learn language rooted in the world.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Bio:</b> Rowan Zellers is a final year PhD candidate at the University of Washington in Computer Science & Engineering, advised by Yejin Choi and Ali Farhadi. His research focuses on enabling machines to understand language, vision, sound, and the world beyond these modalities. He has been recognized through NSF Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has appeared in several media outlets, including Wired, the Washington Post, and the New York Times. In the past, he graduated from Harvey Mudd College with a B.S. in Computer Science & Mathematics, and has interned at the Allen Institute for AI.</font></p></div><div><font face="arial, sans-serif"><br></font></div></div></div><div><font face="arial, sans-serif"><span style="color:rgb(17,17,17)"><b>Host</b>: </span><a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a></font></div><br></div><div><br></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 31, 2022 at 1:36 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b style="font-size:small">When:        </b>Friday, February 4th at <b style="background-color:rgb(255,255,0)">10:30am CT</b></font></p><div style="font-size:small"><b style="background-color:rgb(255,255,255)">Where: </b><b style="background-color:rgb(255,255,255)">      </b><span style="background-color:rgb(255,255,0)"><font style="font-family:arial,sans-serif" color="#000000">Talk will be given</font><font color="#500050" style="font-family:arial,sans-serif"> </font><font color="#0000ff" face="arial, sans-serif" style="font-weight:bold"><u>live, in-person</u></font></span><font color="#0000ff" face="verdana, sans-serif" style="background-color:rgb(255,255,255);font-weight:bold"> </font><font face="arial, sans-serif" color="#000000" style="background-color:rgb(255,255,255)">at</font></div><p class="MsoNormal" style="font-size:small;margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">TTIC, 6045 S. Kenwood Avenue</font></span></p><p class="MsoNormal" style="font-size:small;margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#000000">                    </font><font face="arial, sans-serif" color="#000000">5th Floor, Room 530<b> </b></font></span></p><p class="MsoNormal" style="font-size:small;margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" style="background-color:rgb(255,255,0)"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="font-size:small;margin:0in 0in 8pt;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Where:</span></b><span style="color:black">       Zoom Virtual Talk (</span><b><span style="color:blue"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_1KBwOci8S62CnOJ9hVGMKg" style="color:rgb(5,99,193)" target="_blank"><span style="color:rgb(17,85,204)">register in advance here</span></a></span></b><span style="color:black">)</span></font></p><p class="MsoNormal" style="font-size:small;margin:0in 0in 8pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;line-height:13.91px"><font face="arial, sans-serif"><b><span style="color:black">Who: </span></b><span style="color:black"> </span><span style="color:rgb(80,0,80)">        </span>Rowan Zellers, University of Washington</font></p></div><div style="font-size:small"><div><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Title:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">         Grounding Language by Seeing, Hearing, and Interacting</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><font face="arial, sans-serif"><br></font></span></p><div><div><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Abstract:</b> As humans, our understanding of language is grounded in a rich mental model about “how the world works” – that we learn through perception and interaction. We use this understanding to reason beyond what is literally said, imagining how situations might unfold in the world. Machines today struggle at making such connections, which limits how they can be safely used.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">In my talk, I will discuss three lines of work to bridge this gap between machines and humans. I will first discuss how we might measure grounded understanding. I will introduce a suite of approaches for constructing benchmarks, using machines in the loop to filter out spurious biases. Next, I will introduce PIGLeT: a model that learns physical commonsense understanding by interacting with the world through simulation, using this knowledge to ground language. PIGLeT learns linguistic form and meaning – together – and outperforms text-to-text only models that are orders of magnitude larger. Finally, I will introduce MERLOT, which learns about situations in the world by watching millions of YouTube videos with transcribed speech. The model learns to jointly represent video, audio, and language, together and over time – learning multimodal and neural script knowledge representations.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">Together, these directions suggest a path forward for building machines that learn language rooted in the world.</font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Bio:</b> Rowan Zellers is a final year PhD candidate at the University of Washington in Computer Science & Engineering, advised by Yejin Choi and Ali Farhadi. His research focuses on enabling machines to understand language, vision, sound, and the world beyond these modalities. He has been recognized through NSF Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has appeared in several media outlets, including Wired, the Washington Post, and the New York Times. In the past, he graduated from Harvey Mudd College with a B.S. in Computer Science & Mathematics, and has interned at the Allen Institute for AI.</font></p></div><div><font face="arial, sans-serif"><br></font></div></div></div><div><font face="arial, sans-serif"><span style="color:rgb(17,17,17)"><b>Host</b>: </span><a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a></font></div><br></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 28, 2022 at 5:27 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%"><font face="arial, sans-serif"><b>When:        </b>Friday,

February 4th at <b>10:30am CT</b></font></p>

<p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%"><font face="arial, sans-serif"><b><span style="color:black">Where:</span></b><span style="color:black">     

 Zoom Virtual Talk (</span><b><span style="color:blue"><a href="https://uchicagogroup.zoom.us/webinar/register/WN_1KBwOci8S62CnOJ9hVGMKg" style="color:rgb(5,99,193)" target="_blank"><span style="color:rgb(17,85,204)">register in advance here</span></a></span></b><span style="color:black">)</span></font></p>

<p class="MsoNormal" style="background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;margin:0in 0in 8pt;line-height:107%"><font face="arial, sans-serif"><b><span style="color:black">Who: </span></b><span style="color:black"> </span><span style="color:rgb(80,0,80)"> 

      </span>Rowan Zellers, University of

Washington</font></p></div><div><div><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">Title:</b><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap">         Grounding Language by Seeing, Hearing, and Interacting</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px;white-space:pre-wrap"><font face="arial, sans-serif"><br></font></span></p><div><div><p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Abstract:</b> As

humans, our understanding of language is grounded in a rich mental model about

“how the world works” – that we learn through perception and interaction. We

use this understanding to reason beyond what is literally said, imagining how

situations might unfold in the world. Machines today struggle at making such

connections, which limits how they can be safely used.</font></p>

<p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p>

<p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">In my talk, I

will discuss three lines of work to bridge this gap between machines and

humans. I will first discuss how we might measure grounded understanding. I

will introduce a suite of approaches for constructing benchmarks, using

machines in the loop to filter out spurious biases. Next, I will introduce

PIGLeT: a model that learns physical commonsense understanding by interacting

with the world through simulation, using this knowledge to ground language.

PIGLeT learns linguistic form and meaning – together – and outperforms

text-to-text only models that are orders of magnitude larger. Finally, I will

introduce MERLOT, which learns about situations in the world by watching

millions of YouTube videos with transcribed speech. The model learns to jointly

represent video, audio, and language, together and over time – learning

multimodal and neural script knowledge representations.</font></p>

<p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p>

<p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif">Together, these

directions suggest a path forward for building machines that learn language

rooted in the world.</font></p>

<p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"> </font></p>

<p class="MsoNormal" style="margin:0in;line-height:12pt"><font face="arial, sans-serif"><b>Bio:</b> Rowan

Zellers is a final year PhD candidate at the University of Washington in

Computer Science & Engineering, advised by Yejin Choi and Ali Farhadi. His

research focuses on enabling machines to understand language, vision, sound,

and the world beyond these modalities. He has been recognized through NSF

Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has

appeared in several media outlets, including Wired, the Washington Post, and

the New York Times. In the past, he graduated from Harvey Mudd College with a

B.S. in Computer Science & Mathematics, and has interned at the Allen

Institute for AI.</font></p></div><div><font face="arial, sans-serif"><br></font></div></div></div><div><font face="arial, sans-serif"><span style="color:rgb(17,17,17)"><b>Host</b>: </span><a href="mailto:klivescu@ttic.edu" target="_blank"><b>Karen Livescu</b></a></font></div><div><font face="arial, sans-serif"><br></font></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>

</blockquote></div></div>

</blockquote></div></div>

</blockquote></div></div>

</blockquote></div></div>