[CS] Ziyi Zhang MS Presentation May 21, 2025

via cs cs at mailman.cs.uchicago.edu
Mon May 19 10:14:09 CDT 2025


This is an announcement of Ziyi Zhang's MS Presentation
===============================================
Candidate: Ziyi Zhang

Date: Wednesday, May 21, 2025

Time: 12 pm CST

Location: JCL 346

Title: A comprehensive benchmark on runtime configurations of speculative decoding

Abstract: Speculative decoding has emerged as a powerful technique to reduce latency in large language model (LLM) inference by combining a lightweight draft model with a powerful target model to generate and verify multiple tokens per batch inference. Current systems typically fix runtime configurations within a limited configuration space, such as batch sizes and the number of tree layers, which leads to suboptimal generation speed across different datasets, context lengths, etc. In this work, we systematically explore a broader configuration space, including the number of tree layers, batch sizes, draft model sizes, and precisions under different target models and datasets. By integrating diverse models and precisions into our framework and conducting extensive benchmarking, we provide guidance on selecting optimal runtime configurations for speculative decoding. This research aims to inspire adaptive speculative decoding solutions and inform future draft model design.

Advisors: Hank Hoffmann

Committee: Hank Hoffmann, Haryadi Gunawi, Youjie Li



More information about the cs mailing list