<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div class="gmail_default"><div class="gmail_default" style="color:rgb(80,0,80)"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)">     Fri</font><span class="gmail_default" style="color:rgb(0,0,0)">day, April 26<span class="gmail_default">, </span>2024</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <span style="background-color:rgb(255,255,0)"><u>11:00</u></span></b><b><u><font color="#000000" style="background-color:rgb(255,255,0)"> am</font></u></b><b><u><font color="#000000" style="background-color:rgb(255,255,0)"> CT</font></u><font color="#000000"><u> </u>  </font></b></font></font><br></div><div class="gmail_default"><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where:       </font></b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050">               </font><font color="#000000">    TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                   5th Floor, <b>Room 529</b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"><b><br></b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-size:14px;color:rgb(60,64,67);font-family:Roboto,Helvetica,Arial,sans-serif">Virtually:</b><font color="#3c4043" style="font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px">  </font><font color="#0000ff"><b><span style="font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px">  </span></b></font><span style="color:rgb(60,64,67);font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px"> </span><a href="https://uchicago.zoom.us/j/98297764499?pwd=ajNQSTZnMHRmMENkd1hjdjlNeW1xdz09" id="m_8873285558142120770m_-5957381289012536505m_-9069311980498217442m_-7939295064652347906m_6488095791192162465gmail-ow5237" style="color:rgb(26,115,232);font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px" target="_blank"><b>zoom</b></a></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:georgia,serif;color:rgb(60,64,67);letter-spacing:0.2px"><font size="1">                  </font></b><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b style="font-family:arial,sans-serif">Who: </b><font face="arial, sans-serif"> </font><font color="#500050" style="font-family:arial,sans-serif">    </font><font color="#000000"><font color="#500050"><font face="arial, sans-serif">    </font></font></font></font></font><span>Huan</span> <span>Sun</span>, Ohio State University</p></div></div><div class="gmail_default"><div dir="ltr"><div><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;text-align:center;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif"><hr size="3" width="100%" noshade align="center" style="color:rgb(46,116,181)"></div></div><div><div dir="ltr"><div><p class="MsoNormal"><span style="color:rgb(37,38,37)"><font face="arial, sans-serif"><b>Title:          </b></font></span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">Powers and Peculiarities of “Reasoning” in Large Language Models and Agents</span></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"> </font></p></div><div><p class="MsoNormal"><span style="color:rgb(37,38,37)"><font face="arial, sans-serif"><b>Abstract:</b> </font></span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">Powered by large (language/multimodal) models, an emerging type of AI systems called language agents have seen explosive growth in the past year. In this talk, we will discuss our pioneering work on web agents [1, 2] and show the potential of large multimodal models such as GPT-4V.  What makes such language agents promising? One key contributing factor is the general “reasoning” ability of LLMs. However, do LLMs truly reason or understand? What matters in the de-facto Chain-of-Thought prompting? Are transformers fundamentally limited in compositional reasoning? In the second part of the talk, we will briefly talk about our recent and ongoing work that partly answers these questions. In short, we find that (1) in-context demonstrations with </span><i style="font-family:arial,sans-serif;color:rgb(37,38,37)">invalid</i><span style="font-family:arial,sans-serif;color:rgb(37,38,37)"> CoT rationales do not affect the model performance by a lot [3]. This implies that LLMs do not learn CoT reasoning from in-context examples, which mainly serve as a trigger to format the output. (2)</span><span style="font-family:arial,sans-serif;color:rgb(37,38,37);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"> Even when able to generate correct step-by-step solutions in the beginning, LLMs cannot maintain their beliefs in truth for a significant portion of examples when challenged by absurdly invalid arguments [4]. </span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">(3) Unlike some recent work that claims transformers are fundamentally limited in compositionality, in our ongoing work, we carefully design controlled synthetic datasets and observe that transformers can generalize in compositional reasoning after “grokking”. Finally, I will conclude with some thoughts on future directions.</span></p><div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black"><br>[1]Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, <span>Huan</span> <span>Sun</span>, Yu Su, “</span><a href="https://arxiv.org/abs/2306.06070" target="_blank">Mind2Web: Towards a generalist agent for the web</a><span style="color:black">,” The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23, Spotlight)</span><u></u><u></u></font></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black">[2]Boyuan Zheng, Boyu Gou, Jihyung Kil, <span>Huan</span> <span>Sun</span>, Yu Su, “</span><a href="https://arxiv.org/abs/2401.01614" target="_blank">Gpt-4v (ision) is a generalist web agent, if grounded</a><span style="color:black">”, Under Review, 2024. </span><u></u><u></u></font></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black">[3] Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, <span>Huan</span> <span>Sun</span>, “</span><a href="https://arxiv.org/abs/2212.10001" target="_blank">Towards understanding chain-of-thought prompting: An empirical study of what matters</a><span style="color:black">”, ACL 2023. </span><span style="color:rgb(37,38,37)"> </span><u></u><u></u></font></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black">[4]Boshi Wang, Xiang Yue, <span>Huan</span> <span>Sun</span>, “</span><a href="https://arxiv.org/abs/2305.13160" target="_blank">Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate</a><span style="color:black">”, Findings of EMNLP 2023</span></font></p></div></div><div><br></div></div><div><p class="MsoNormal" style="margin-bottom:12pt"><span style="color:black"><font face="arial, sans-serif"><b>Bio: </b></font></span><span style="color:black;font-family:arial,sans-serif;text-align:justify"><span>Huan</span> <span>Sun</span> is an endowed CoE Innovation Scholar and tenured associate professor in the Department of Computer Science and Engineering at The Ohio State University. Her research interests lie in natural language processing and artificial intelligence, with recent work on web agents, large language model evaluation, and foundation models for chemistry. <span>Huan</span> received Honorable Mentions for Best Paper Awards at ACL’23 (two papers), a SIGMOD Research Highlight Award, BIBM Best Paper Award, Google Research Scholar and Google Faculty Award, NSF CAREER Award, OSU Lumley Research Award, and SIGKDD Ph.D. Dissertation Runner-Up Award, among others. Her team won third place in the first Alexa Prize TaskBot challenge in 2022. <span>Huan</span> received her Ph.D. from the University of California, Santa Barbara and B.S. from the University of Science and Technology of China. </span></p></div><div><p class="MsoNormal" style="text-align:justify"><b style="font-family:arial,sans-serif"><br></b></p><p class="MsoNormal" style="text-align:justify"><b style="font-family:arial,sans-serif">Host: </b><a href="mailto:jzhou@ttic.edu" style="font-family:arial,sans-serif" target="_blank"><b>Jiawei Zhou</b></a><br></p></div></div></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 25, 2024 at 12:24 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-size:small"><div><div style="color:rgb(80,0,80)"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)">     Fri</font><span class="gmail_default" style="color:rgb(0,0,0)">day, April 26<span class="gmail_default">, </span>2024</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <span style="background-color:rgb(255,255,0)"><u>11:00</u></span></b><b><u><font color="#000000" style="background-color:rgb(255,255,0)"> am</font></u></b><b><u><font color="#000000" style="background-color:rgb(255,255,0)"> CT</font></u><font color="#000000"><u> </u>  </font></b></font></font><br></div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where:       </font></b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050">               </font><font color="#000000">    TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                   5th Floor, <b>Room 529</b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000"><b><br></b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-size:14px;color:rgb(60,64,67);font-family:Roboto,Helvetica,Arial,sans-serif">Virtually:</b><font color="#3c4043" style="font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px">  </font><span style="background-color:rgb(255,255,255)"><font color="#0000ff"><b><span style="font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px">  </span></b></font><span style="color:rgb(60,64,67);font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px"> </span><a href="https://uchicago.zoom.us/j/98297764499?pwd=ajNQSTZnMHRmMENkd1hjdjlNeW1xdz09" id="m_8873285558142120770m_-5957381289012536505m_-9069311980498217442m_-7939295064652347906m_6488095791192162465gmail-ow5237" style="color:rgb(26,115,232);font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px" target="_blank"><b>zoom</b></a></span></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:georgia,serif;color:rgb(60,64,67);letter-spacing:0.2px"><font size="1">                  </font></b><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b style="font-family:arial,sans-serif">Who: </b><font face="arial, sans-serif"> </font><font color="#500050" style="font-family:arial,sans-serif">    </font><font color="#000000"><font color="#500050"><font face="arial, sans-serif">    </font></font></font></font></font><span>Huan</span> <span>Sun</span>, Ohio State University</p></div></div><div><div dir="ltr"><div><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;text-align:center;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif"><hr size="3" width="100%" noshade align="center" style="color:rgb(46,116,181)"></div></div><div><div dir="ltr"><div><p class="MsoNormal"><span style="color:rgb(37,38,37)"><font face="arial, sans-serif"><b>Title:          </b></font></span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">Powers and Peculiarities of “Reasoning” in Large Language Models and Agents</span></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"> </font></p></div><div><p class="MsoNormal"><span style="color:rgb(37,38,37)"><font face="arial, sans-serif"><b>Abstract:</b> </font></span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">Powered by large (language/multimodal) models, an emerging type of AI systems called language agents have seen explosive growth in the past year. In this talk, we will discuss our pioneering work on web agents [1, 2] and show the potential of large multimodal models such as GPT-4V.  What makes such language agents promising? One key contributing factor is the general “reasoning” ability of LLMs. However, do LLMs truly reason or understand? What matters in the de-facto Chain-of-Thought prompting? Are transformers fundamentally limited in compositional reasoning? In the second part of the talk, we will briefly talk about our recent and ongoing work that partly answers these questions. In short, we find that (1) in-context demonstrations with </span><i style="font-family:arial,sans-serif;color:rgb(37,38,37)">invalid</i><span style="font-family:arial,sans-serif;color:rgb(37,38,37)"> CoT rationales do not affect the model performance by a lot [3]. This implies that LLMs do not learn CoT reasoning from in-context examples, which mainly serve as a trigger to format the output. (2)</span><span style="font-family:arial,sans-serif;color:rgb(37,38,37);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"> Even when able to generate correct step-by-step solutions in the beginning, LLMs cannot maintain their beliefs in truth for a significant portion of examples when challenged by absurdly invalid arguments [4]. </span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">(3) Unlike some recent work that claims transformers are fundamentally limited in compositionality, in our ongoing work, we carefully design controlled synthetic datasets and observe that transformers can generalize in compositional reasoning after “grokking”. Finally, I will conclude with some thoughts on future directions.</span></p><div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black"><br>[1]Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, <span>Huan</span> <span>Sun</span>, Yu Su, “</span><a href="https://arxiv.org/abs/2306.06070" target="_blank">Mind2Web: Towards a generalist agent for the web</a><span style="color:black">,” The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23, Spotlight)</span><u></u><u></u></font></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black">[2]Boyuan Zheng, Boyu Gou, Jihyung Kil, <span>Huan</span> <span>Sun</span>, Yu Su, “</span><a href="https://arxiv.org/abs/2401.01614" target="_blank">Gpt-4v (ision) is a generalist web agent, if grounded</a><span style="color:black">”, Under Review, 2024. </span><u></u><u></u></font></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black">[3] Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, <span>Huan</span> <span>Sun</span>, “</span><a href="https://arxiv.org/abs/2212.10001" target="_blank">Towards understanding chain-of-thought prompting: An empirical study of what matters</a><span style="color:black">”, ACL 2023. </span><span style="color:rgb(37,38,37)"> </span><u></u><u></u></font></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black">[4]Boshi Wang, Xiang Yue, <span>Huan</span> <span>Sun</span>, “</span><a href="https://arxiv.org/abs/2305.13160" target="_blank">Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate</a><span style="color:black">”, Findings of EMNLP 2023</span></font></p></div></div><div><br></div></div><div><p class="MsoNormal" style="margin-bottom:12pt"><span style="color:black"><font face="arial, sans-serif"><b>Bio: </b></font></span><span style="color:black;font-family:arial,sans-serif;text-align:justify"><span>Huan</span> <span>Sun</span> is an endowed CoE Innovation Scholar and tenured associate professor in the Department of Computer Science and Engineering at The Ohio State University. Her research interests lie in natural language processing and artificial intelligence, with recent work on web agents, large language model evaluation, and foundation models for chemistry. <span>Huan</span> received Honorable Mentions for Best Paper Awards at ACL’23 (two papers), a SIGMOD Research Highlight Award, BIBM Best Paper Award, Google Research Scholar and Google Faculty Award, NSF CAREER Award, OSU Lumley Research Award, and SIGKDD Ph.D. Dissertation Runner-Up Award, among others. Her team won third place in the first Alexa Prize TaskBot challenge in 2022. <span>Huan</span> received her Ph.D. from the University of California, Santa Barbara and B.S. from the University of Science and Technology of China. </span></p></div><div><p class="MsoNormal" style="text-align:justify"><b style="font-family:arial,sans-serif"><br></b></p><p class="MsoNormal" style="text-align:justify"><b style="font-family:arial,sans-serif">Host: </b><a href="mailto:jzhou@ttic.edu" style="font-family:arial,sans-serif" target="_blank"><b>Jiawei Zhou</b></a><br></p></div></div></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 23, 2024 at 1:24 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div style="font-size:small;color:rgb(80,0,80)"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="font-family:arial,sans-serif;vertical-align:inherit"><font style="vertical-align:inherit"><font style="color:rgb(0,0,0)">     Fri</font><span class="gmail_default" style="color:rgb(0,0,0)">day, April 26<span class="gmail_default">, </span>2024</span><font style="color:rgb(0,0,0)"> at</font><b style="color:rgb(0,0,0)"> <span style="background-color:rgb(255,255,0)"><u>11:00</u></span></b><b><u><font color="#000000" style="background-color:rgb(255,255,0)"> am</font></u></b><b><u><font color="#000000" style="background-color:rgb(255,255,0)"> CT</font></u><font color="#000000"><u> </u>  </font></b></font></font><br></div><div><p style="font-size:small;color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050"><br></font></b></p><p style="font-size:small;color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b style="font-family:arial,sans-serif"><font color="#500050">Where:       </font></b><font color="#000000" style="font-family:arial,sans-serif">Talk will be given </font><font color="#000000" style="font-family:arial,sans-serif;font-weight:bold"><u>live, in-person</u></font><font style="font-family:arial,sans-serif;font-weight:bold"> </font><span style="font-family:arial,sans-serif">at</span><br></p><p class="MsoNormal" style="font-size:small;margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050">               </font><font color="#000000">    TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="font-size:small;margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                   5th Floor, <b>Room 529</b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-size:14px;color:rgb(60,64,67);font-family:Roboto,Helvetica,Arial,sans-serif">Virtually:</b><font color="#3c4043" style="font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px">  </font><font color="#0000ff"><b><span style="font-family:Roboto,Helvetica,Arial,sans-serif;font-size:14px">   </span><font size="1" face="verdana, sans-serif">tba</font></b></font></p><p class="MsoNormal" style="font-size:small;margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:georgia,serif;color:rgb(60,64,67);letter-spacing:0.2px"><font size="1">                  </font></b><br></p><p class="MsoNormal" style="font-size:small;margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b style="font-family:arial,sans-serif">Who: </b><font face="arial, sans-serif"> </font><font color="#500050" style="font-family:arial,sans-serif">    </font><font color="#000000"><font color="#500050"><font face="arial, sans-serif">    </font></font></font></font></font>Huan Sun, Ohio State University</p></div></div><div><div dir="ltr"><div style="font-size:small"><div class="MsoNormal" align="center" style="margin:0in 0in 8pt;text-align:center;line-height:15.6933px;font-size:11pt;font-family:Calibri,sans-serif"><hr size="3" width="100%" noshade align="center" style="color:rgb(46,116,181)"></div></div><div><div dir="ltr"><div><p class="MsoNormal"><span style="color:rgb(37,38,37)"><font face="arial, sans-serif"><b>Title:          </b></font></span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">Powers and Peculiarities of “Reasoning” in Large Language Models and Agents</span></p></div><div><p class="MsoNormal"><font face="arial, sans-serif"> </font></p></div><div><p class="MsoNormal"><span style="color:rgb(37,38,37)"><font face="arial, sans-serif"><b>Abstract:</b> </font></span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">Powered by large (language/multimodal) models, an emerging type of AI systems called language agents have seen explosive growth in the past year. In this talk, we will discuss our pioneering work on web agents [1, 2] and show the potential of large multimodal models such as GPT-4V.  What makes such language agents promising? One key contributing factor is the general “reasoning” ability of LLMs. However, do LLMs truly reason or understand? What matters in the de-facto Chain-of-Thought prompting? Are transformers fundamentally limited in compositional reasoning? In the second part of the talk, we will briefly talk about our recent and ongoing work that partly answers these questions. In short, we find that (1) in-context demonstrations with </span><i style="font-family:arial,sans-serif;color:rgb(37,38,37)">invalid</i><span style="font-family:arial,sans-serif;color:rgb(37,38,37)"> CoT rationales do not affect the model performance by a lot [3]. This implies that LLMs do not learn CoT reasoning from in-context examples, which mainly serve as a trigger to format the output. (2)</span><span style="font-family:arial,sans-serif;color:rgb(37,38,37);background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"> Even when able to generate correct step-by-step solutions in the beginning, LLMs cannot maintain their beliefs in truth for a significant portion of examples when challenged by absurdly invalid arguments [4]. </span><span style="font-family:arial,sans-serif;color:rgb(37,38,37)">(3) Unlike some recent work that claims transformers are fundamentally limited in compositionality, in our ongoing work, we carefully design controlled synthetic datasets and observe that transformers can generalize in compositional reasoning after “grokking”. Finally, I will conclude with some thoughts on future directions. </span></p></div><div><p class="MsoNormal" style="margin-bottom:12pt"><span style="color:black"><font face="arial, sans-serif"><b><br></b></font></span></p><p class="MsoNormal" style="margin-bottom:12pt"><span style="color:black"><font face="arial, sans-serif"><b>Bio: </b></font></span><span style="color:black;font-family:arial,sans-serif;text-align:justify">Huan Sun is an endowed CoE Innovation Scholar and tenured associate professor in the Department of Computer Science and Engineering at The Ohio State University. Her research interests lie in natural language processing and artificial intelligence, with recent work on web agents, large language model evaluation, and foundation models for chemistry. Huan received Honorable Mentions for Best Paper Awards at ACL’23 (two papers), a SIGMOD Research Highlight Award, BIBM Best Paper Award, Google Research Scholar and Google Faculty Award, NSF CAREER Award, OSU Lumley Research Award, and SIGKDD Ph.D. Dissertation Runner-Up Award, among others. Her team won third place in the first Alexa Prize TaskBot challenge in 2022. Huan received her Ph.D. from the University of California, Santa Barbara and B.S. from the University of Science and Technology of China. </span></p></div><div><p class="MsoNormal" style="text-align:justify"><b style="font-family:arial,sans-serif"><br></b></p><p class="MsoNormal" style="text-align:justify"><b style="font-family:arial,sans-serif">Host: </b><a href="mailto:jzhou@ttic.edu" style="font-family:arial,sans-serif" target="_blank"><b>Jiawei Zhou</b></a><br></p></div></div></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div><div style="font-size:small"><br></div></div></div></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>
</blockquote></div></div>
</blockquote></div></div>