[CS] REMINDER: Chaoqi Wang Candidacy Exam/Nov. 20th

Fri Nov 8 13:01:47 CST 2024

This is an announcement of Chaoqi Wang's Candidacy Exam.
===============================================
Candidate: Chaoqi Wang

Date: Wednesday, November 20

Time: 4PM -5PM CST 

Location: JCL 390

Remote Location: https://urldefense.com/v3/__https://www.google.com/url?q=https:**Auchicago.zoom.us*j*4959891834*pwd*3DTXRsbmtGUkJhNWpvZk9aVHBDYjFHdz09&sa=D&source=calendar&usd=2&usg=AOvVaw2M-iGX3eKWItqxa1PldJ-m__;Ly8vLz8l!!BpyFHLRN4TMTrA!-M2i6naQUvTWJCQBKMUe3hpkOr_y-2A_KyqByiTraK9BF-tcMPYyO8C4CvrWSNxMF79G1bC9iOCVK8r3JClTjA$

Title: Towards Robust Alignment of Language Models with Human Preferences

Abstract: The rapid advancement of large language models (LLMs) has improved natural language understanding and generation, yet aligning these models with human preferences remains challenging due to safety concerns, training complexities, and biases from spurious correlations. This thesis introduces new optimization techniques and bias mitigation strategies to improve LLM alignment with human values.

We first present \textbf{$f$-Direct Preference Optimization ($f$-DPO)}, an extension of Direct Preference Optimization that uses various \$f\$-divergences to simplify the relationship between the reward function and optimal policy, eliminating the need for normalizing constants. Empirical results show $f$-DPO balances alignment performance and generation diversity, surpassing traditional Proximal Policy Optimization methods in divergence efficiency. Next, we address the limitations of single-sample comparisons by proposing \textbf{Multi-sample Direct Preference Optimization (mDPO)} and \textbf{Multi-sample Identity Preference Optimization (mIPO)}. These methods use group-wise preferences to optimize collective characteristics, enhancing diversity and reducing bias more effectively than single-sample approaches, especially in noisy label environments. Finally, we incorporate causal inference into the alignment process with \textbf{causal reward modeling}, enforcing counterfactual invariance to reduce biases like length, sycophancy, concept, and discrimination biases. This approach ensures more reliable and fair alignment of LLMs with human preferences.

Overall, the optimization frameworks and bias mitigation strategies in this thesis offer practical improvements to alignment workflows, contributing to the development of trustworthy AI systems that adhere to ethical standards and reflect human preferences.

Advisors: Yuxin Chen

Committee: Yuxin Chen, Ari Holtzman and Haifeng Xu