[CS] TODAY: Ziyu Ye Candidacy Exam/Sep 8, 2025

Mon Sep 8 09:04:08 CDT 2025

This is an announcement of Ziyu Ye's Candidacy Exam.
===============================================
Candidate: Ziyu Ye

Date: Monday, September 08, 2025

Time:  1:30 pm CST

Remote Location:  https://uchicago.zoom.us/my/machine?pwd=Y1dvMzBtM2xERzhXcHhuQU85dkZwUT09

Location: JCL 298

Title: Self-Play Methods in Reinforcement Learning for Language Models

Abstract: Reinforcement learning has long emphasized learning to act, with little principled understanding of learning what to experience. This thesis introduces experience shaping as an auxiliary objective jointly optimized with reward maximization, and presents practical self-play algorithms that bring concrete gains on standard benchmarks of language models.

We propose practical algorithms in both the post-training and pre-training stages of training language models. In the post-training stage, we first introduce the evolving alignment (eva) framework, which replaces the classical static training scheme with an adaptive one, where a learnable creator policy generates and schedules training contexts for the solver policy. We also present the reasoning in reasoning (rir) framework, which enables the solver to use hierarchical self-play: the solver learns to generate and search over subgoals to shape its own training experience. In the pre-training stage, we develop native reasoning models with self-supervised reward, discarding the next-token prediction paradigm in favor of sequence-level training where models learn to complete full trajectories guided by self-supervised reward signals. Together, these methods enable models to surpass the state of the art with far smaller model and dataset sizes.

Overall, we design a new family of reinforcement learning algorithms that let language model based agents learn to self play to strategically generate and shape its training experiences, advancing them towards human-level intelligence in alignment and reasoning tasks.

Advisors: Yuxin Chen

Committee Members: Yuxin Chen, Haifeng Xu, and Kaiyu Yang