[CS] UPDATE: Ziyu Ye Dissertation Defense/Nov 5, 2025

via cs cs at mailman.cs.uchicago.edu
Tue Nov 4 08:53:01 CST 2025


This is an announcement of Ziyu Ye's Dissertation Defense.
===============================================
Candidate: Ziyu Ye

Date: Wednesday, November 05, 2025

Time:  9 am CST

Remote Location: meet.google.com/kft-dhdt-evw

Title: Self-Play Methods in Reinforcement Learning for Language Models

Abstract: Reinforcement learning has long emphasized learning to act, with less principled understanding of learning what to experience. This thesis introduces experience shaping as an auxiliary objective jointly optimized with reward maximization, and presents practical self-play algorithms that bring concrete gains on standard benchmarks of language models.

We propose practical algorithms in both the post-training and pre-training stages of training language models. In the post-training stage, we first introduce the evolving alignment framework, which replaces the classical static training scheme with an adaptive one, where a learnable creator policy generates and schedules training contexts for the solver policy. We also present the reasoning in the reasoning framework, which enables the solver to use hierarchical self-play: the solver learns to generate and search over subgoals to shape its own training experience. In the pre-training stage, we develop native reasoning models with self-supervised reward, discarding the next-token prediction paradigm in favor of sequence-level training where models learn to complete full trajectories guided by self-supervised reward signals. Together, these methods enable models to surpass the state of the art with far smaller model and dataset sizes.

Overall, we design a new family of reinforcement learning algorithms that let language model based agents learn to self play to strategically generate and shape its training experiences, advancing them towards human-level intelligence in alignment and reasoning tasks.

Advisors: Yuxin Chen

Committee: Yuxin Chen, Haifeng Xu, Kaiyu Yang



More information about the cs mailing list