[CS] Ziyi Zhang MS Presentation May 21, 2025
via cs
cs at mailman.cs.uchicago.edu
Mon May 19 10:14:09 CDT 2025
This is an announcement of Ziyi Zhang's MS Presentation
===============================================
Candidate: Ziyi Zhang
Date: Wednesday, May 21, 2025
Time: 12 pm CST
Location: JCL 346
Title: A comprehensive benchmark on runtime configurations of speculative decoding
Abstract: Speculative decoding has emerged as a powerful technique to reduce latency in large language model (LLM) inference by combining a lightweight draft model with a powerful target model to generate and verify multiple tokens per batch inference. Current systems typically fix runtime configurations within a limited configuration space, such as batch sizes and the number of tree layers, which leads to suboptimal generation speed across different datasets, context lengths, etc. In this work, we systematically explore a broader configuration space, including the number of tree layers, batch sizes, draft model sizes, and precisions under different target models and datasets. By integrating diverse models and precisions into our framework and conducting extensive benchmarking, we provide guidance on selecting optimal runtime configurations for speculative decoding. This research aims to inspire adaptive speculative decoding solutions and inform future draft model design.
Advisors: Hank Hoffmann
Committee: Hank Hoffmann, Haryadi Gunawi, Youjie Li
More information about the cs
mailing list