[Colloquium] REMINDER: 4/26 TTIC/UChicago NLP Seminar: Huan Sun, Ohio State University

Mary Marre mmarre at ttic.edu
Thu Apr 25 12:24:08 CDT 2024


*When:*         Friday, April 26, 2024 at* 11:00** am** CT   *


*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, *Room 529*


*Virtually:*     *zoom*
<https://uchicago.zoom.us/j/98297764499?pwd=ajNQSTZnMHRmMENkd1hjdjlNeW1xdz09>



*Who: *         Huan Sun, Ohio State University
------------------------------

*Title:          *Powers and Peculiarities of “Reasoning” in Large Language
Models and Agents



*Abstract:* Powered by large (language/multimodal) models, an emerging type
of AI systems called language agents have seen explosive growth in the past
year. In this talk, we will discuss our pioneering work on web agents [1,
2] and show the potential of large multimodal models such as GPT-4V.  What
makes such language agents promising? One key contributing factor is the
general “reasoning” ability of LLMs. However, do LLMs truly reason or
understand? What matters in the de-facto Chain-of-Thought prompting? Are
transformers fundamentally limited in compositional reasoning? In the
second part of the talk, we will briefly talk about our recent and ongoing
work that partly answers these questions. In short, we find that (1)
in-context demonstrations with *invalid* CoT rationales do not affect the
model performance by a lot [3]. This implies that LLMs do not learn CoT
reasoning from in-context examples, which mainly serve as a trigger to
format the output. (2) Even when able to generate correct step-by-step
solutions in the beginning, LLMs cannot maintain their beliefs in truth for
a significant portion of examples when challenged by absurdly invalid
arguments [4]. (3) Unlike some recent work that claims transformers are
fundamentally limited in compositionality, in our ongoing work, we
carefully design controlled synthetic datasets and observe that
transformers can generalize in compositional reasoning after “grokking”.
Finally, I will conclude with some thoughts on future directions.


[1]Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi
Wang, Huan Sun, Yu Su, “Mind2Web: Towards a generalist agent for the web
<https://arxiv.org/abs/2306.06070>,” The Thirty-seventh Conference on
Neural Information Processing Systems (NeurIPS'23, Spotlight)

[2]Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su, “Gpt-4v (ision) is
a generalist web agent, if grounded <https://arxiv.org/abs/2401.01614>”,
Under Review, 2024.

[3] Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke
Zettlemoyer, Huan Sun, “Towards understanding chain-of-thought prompting:
An empirical study of what matters <https://arxiv.org/abs/2212.10001>”, ACL
2023.

[4]Boshi Wang, Xiang Yue, Huan Sun, “Can ChatGPT defend its belief in
truth? evaluating LLM reasoning via debate
<https://arxiv.org/abs/2305.13160>”, Findings of EMNLP 2023

*Bio: *Huan Sun is an endowed CoE Innovation Scholar and tenured associate
professor in the Department of Computer Science and Engineering at The Ohio
State University. Her research interests lie in natural language processing
and artificial intelligence, with recent work on web agents, large language
model evaluation, and foundation models for chemistry. Huan received
Honorable Mentions for Best Paper Awards at ACL’23 (two papers), a SIGMOD
Research Highlight Award, BIBM Best Paper Award, Google Research Scholar
and Google Faculty Award, NSF CAREER Award, OSU Lumley Research Award, and
SIGKDD Ph.D. Dissertation Runner-Up Award, among others. Her team won third
place in the first Alexa Prize TaskBot challenge in 2022. Huan received her
Ph.D. from the University of California, Santa Barbara and B.S. from the
University of Science and Technology of China.


*Host: **Jiawei Zhou* <jzhou at ttic.edu>




Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL  60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*


On Tue, Apr 23, 2024 at 1:24 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*         Friday, April 26, 2024 at* 11:00** am** CT   *
>
>
> *Where:       *Talk will be given *live, in-person* at
>
>                    TTIC, 6045 S. Kenwood Avenue
>
>                    5th Floor, *Room 529*
>
> *Virtually:*  *   tba*
>
>
>
> *Who: *         Huan Sun, Ohio State University
> ------------------------------
>
> *Title:          *Powers and Peculiarities of “Reasoning” in Large
> Language Models and Agents
>
>
>
> *Abstract:* Powered by large (language/multimodal) models, an emerging
> type of AI systems called language agents have seen explosive growth in the
> past year. In this talk, we will discuss our pioneering work on web agents
> [1, 2] and show the potential of large multimodal models such as GPT-4V.
> What makes such language agents promising? One key contributing factor is
> the general “reasoning” ability of LLMs. However, do LLMs truly reason or
> understand? What matters in the de-facto Chain-of-Thought prompting? Are
> transformers fundamentally limited in compositional reasoning? In the
> second part of the talk, we will briefly talk about our recent and ongoing
> work that partly answers these questions. In short, we find that (1)
> in-context demonstrations with *invalid* CoT rationales do not affect the
> model performance by a lot [3]. This implies that LLMs do not learn CoT
> reasoning from in-context examples, which mainly serve as a trigger to
> format the output. (2) Even when able to generate correct step-by-step
> solutions in the beginning, LLMs cannot maintain their beliefs in truth for
> a significant portion of examples when challenged by absurdly invalid
> arguments [4]. (3) Unlike some recent work that claims transformers are
> fundamentally limited in compositionality, in our ongoing work, we
> carefully design controlled synthetic datasets and observe that
> transformers can generalize in compositional reasoning after “grokking”.
> Finally, I will conclude with some thoughts on future directions.
>
>
> *Bio: *Huan Sun is an endowed CoE Innovation Scholar and tenured
> associate professor in the Department of Computer Science and Engineering
> at The Ohio State University. Her research interests lie in natural
> language processing and artificial intelligence, with recent work on web
> agents, large language model evaluation, and foundation models for
> chemistry. Huan received Honorable Mentions for Best Paper Awards at ACL’23
> (two papers), a SIGMOD Research Highlight Award, BIBM Best Paper Award,
> Google Research Scholar and Google Faculty Award, NSF CAREER Award, OSU
> Lumley Research Award, and SIGKDD Ph.D. Dissertation Runner-Up Award, among
> others. Her team won third place in the first Alexa Prize TaskBot challenge
> in 2022. Huan received her Ph.D. from the University of California, Santa
> Barbara and B.S. from the University of Science and Technology of China.
>
>
> *Host: **Jiawei Zhou* <jzhou at ttic.edu>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue, Rm 517*
> *Chicago, IL  60637*
> *773-834-1757*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240425/d6a1ac4f/attachment-0001.html>


More information about the Colloquium mailing list