[Colloquium] TODAY:Optimizing Communication for Distributed LLM Inference

Roberto Vale via Colloquium colloquium at mailman.cs.uchicago.edu
Tue Oct 15 08:33:30 CDT 2024


UNIVERSITY OF CHICAGO
COMPUTER SCIENCE DEPARTMENT
PRESENTS

Kuntai DU
University of Chicago
Ph.D Candidate

AI+System Seminars
Tuesday, October 15th
12:30 pm - 1:30 pm
In Person: John Crerar Library Rm 298


Title: Optimizing Communication for Distributed LLM Inference

Abstract: Previous work has identified GPU memory capacity as the primary bottleneck in LLM inference, necessitating effective KV cache management strategies. However, based on a long-term collaboration with the most popular open-source serving engine vLLM, we observe that the landscape is shifting toward distributed LLM inference due to new emerging trends such as long-context KV cache reuse, disaggregated prefilling, and multi-modal LLM inference. The central challenge thus evolves to efficient KV cache communication mechanisms. This talk explores potential solutions for optimizing communication in distributed systems and argues that effective communication requires two new roles: an orchestrator and a KV cache-store. These roles must collaborate closely to meet stringent service-level objectives, paving the way for scalable and efficient distributed LLM-serving systems

Bio: Kuntai Du is a 6th-year PhD from UChicago. His research focus is data transfer for distributed inference systems, including analytic-aware video streaming in distributed video analytic settings and effective KV cache transfer for distributed LLM inference. He is the recipient of the Siebel Scholarship.

[Screenshot 2024-10-14 at 8.38.43 AM.png]

Host: Junchen Jiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20241015/b9146384/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2024-10-14 at 8.38.43 AM.png
Type: image/png
Size: 393983 bytes
Desc: Screenshot 2024-10-14 at 8.38.43 AM.png
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20241015/b9146384/attachment-0001.png>


More information about the Colloquium mailing list