[Theory] NOW: [Talks at TTIC] 12/10 Young Researcher Seminar Series: Jiafei Duan, University of Washington
Brandie Jones via Theory
theory at mailman.cs.uchicago.edu
Wed Dec 10 10:55:00 CST 2025
*When:* Wednesday, December 10th at *11am CT*
*Where: *Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually:* via Panopto (livestream
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=9916e727-d7c9-45aa-afcb-b39b01430412>
)
*Who: * Jiafei Duan, University of Washington
*Title:* Towards Robotics Foundation Models that can Reason.
*Abstract: *In recent years, we have witnessed remarkable progress in
generative AI, particularly in language and visual understanding and
generation. This leap has been fueled by unprecedentedly large image–text
datasets and the scaling of large language and vision models trained on
them. Increasingly, these advances are being leveraged to equip and empower
robots with open-world visual understanding and reasoning capabilities.
Yet, despite these advances, scaling such models for robotics remains
challenging due to the scarcity of large-scale, high-quality robot
interaction data, limiting their ability to generalize and truly reason
about actions in the real world. Nonetheless, promising results are
emerging from using multimodal large language models (MLLMs) as the
backbone of robotic systems, especially in enabling the acquisition of
low-level skills required for robust deployment in everyday household
settings.
In this talk, I will present three recent works that aim to bridge the gap
between rich semantic world knowledge in MLLMs and actionable robot
control. I will begin with AHA, a vision-language model that reasons about
failures in robotic manipulation and improves the robustness of existing
systems. Building on this, I will introduce SAM2Act, a 3D generalist
robotic model with a memory-centric architecture capable of performing
high-precision manipulation tasks while retaining and reasoning over past
observations. Finally, I will present MolmoAct, AI2’s flagship robotic
foundation model for spatial reasoning, designed as a generalist system
that can be post-trained for a wide range of downstream manipulation tasks.
*Bio*: Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering
at the University of Washington, advised by Professors Dieter Fox and
Ranjay Krishna. His research focuses on foundation models for robotics,
with an emphasis on developing scalable data collection and generation
methods, grounding vision-language models in robotic reasoning, and
advancing robust generalization in robot learning. His work has been
featured in MIT Technology Review, GreekWire, VentureBeat, and Business
Wire.
Jiafei’s research has been published in top AI and robotics venues,
including ICLR, ICML, RSS, CoRL, ECCV, IJCAI, CoLM, and EMNLP, and has
earned awards such as Best Paper at Ubiquitous Robots 2023 and a Spotlight
at ICLR 2024. He is a recipient of both the ASTAR National Science PhD
Scholarship and the ASTAR Undergraduate Scholarship.
*Host: Matt Walter <mwalter at ttic.edu>*
*Brandie Jones *
*Executive **Administrative Assistant*
*Outreach Administrator *
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL 60637
www.ttic.edu
*OOO: November 24th - December 7th*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20251210/b890d15a/attachment.html>
More information about the Theory
mailing list