[Colloquium] 3/6 Can Helpful Assistants be Unpredictable? Limits of Aligned LLMs

Markayla Epps via Colloquium colloquium at mailman.cs.uchicago.edu
Wed Mar 5 11:03:36 CST 2025


UNIVERSITY OF CHICAGO
COMPUTER SCIENCE DEPARTMENT
PRESENTS

Peter West
Assistant Professor
University of British Columbia


Thursday, March 6th, 2025
1:00PM – 2:00PM CT .  Lunch will be provided at 12:30PM CT
Location : TTIC 530
Zoom Link : Click Here  <https://uchicago.zoom.us/j/97708787366?pwd=qD9KdtzwGCTPuciAfr6iaap3NubZgy.1#success>

Title: Can Helpful Assistants be Unpredictable? Limits of Aligned LLMs

Abstract: The majority of public-facing language models have undergone some form of alignment--a family of techniques (e.g. reinforcement learning from human feedback) which aim to make models safer, more honest, and better at following instructions. In this talk, I will investigate the downsides of aligning LLMs. While the process improves model performance across a broad range of benchmark tasks, particularly those for which a "correct" answer is clear, it seems to mitigate some of the most interesting aspects of LLMs, including unpredictability and generation of text that humans find creative.

Bio: Peter<https://peterwestai.notion.site/> is an Assistant Professor at UBC and a recent postdoc at the Stanford Institute for Human-Centered AI working in Natural Language Processing. His research broadly studies the capabilities and limits of large language models (and other generative AI systems). His work has been recognized with multiple awards, including best method paper at NAACL 2022, outstanding paper at ACL 2023, and outstanding paper at EMNLP 2023


[cid:image001.png at 01DB8DBD.BAA59230]




--
Markayla Epps
Business Assistant
Computer Science Department
The University of Chicago
5730 S. Ellis - JCL 212
Chicago, IL 60637
Markayla at uchicago.edu
(773) 702-0723
https://cs.uchicago.edu/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20250305/6dc5790d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4288241 bytes
Desc: image001.png
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20250305/6dc5790d/attachment-0001.png>


More information about the Colloquium mailing list