[Colloquium] Optimizing Communication for Distributed LLM Inference
Roberto Vale via Colloquium
colloquium at mailman.cs.uchicago.edu
Mon Oct 14 08:43:04 CDT 2024
UNIVERSITY OF CHICAGO
COMPUTER SCIENCE DEPARTMENT
PRESENTS
Kuntai DU
University of Chicago
Ph.D Candidate
AI+System Seminars
Tuesday, October 14th
12:30 pm - 1:30 pm
In Person: John Crerar Library Rm 298
Title: Optimizing Communication for Distributed LLM Inference
Abstract: Previous work has identified GPU memory capacity as the primary bottleneck in LLM inference, necessitating effective KV cache management strategies. However, based on a long-term collaboration with the most popular open-source serving engine vLLM, we observe that the landscape is shifting toward distributed LLM inference due to new emerging trends such as long-context KV cache reuse, disaggregated prefilling, and multi-modal LLM inference. The central challenge thus evolves to efficient KV cache communication mechanisms. This talk explores potential solutions for optimizing communication in distributed systems and argues that effective communication requires two new roles: an orchestrator and a KV cache-store. These roles must collaborate closely to meet stringent service-level objectives, paving the way for scalable and efficient distributed LLM-serving systems
Bio: Kuntai Du is a 6th-year PhD from UChicago. His research focus is data transfer for distributed inference systems, including analytic-aware video streaming in distributed video analytic settings and effective KV cache transfer for distributed LLM inference. He is the recipient of the Siebel Scholarship.
[Screenshot 2024-10-14 at 8.38.43 AM.png]
Host: Junchen Jiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20241014/33488b31/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2024-10-14 at 8.38.43 AM.png
Type: image/png
Size: 393983 bytes
Desc: Screenshot 2024-10-14 at 8.38.43 AM.png
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20241014/33488b31/attachment-0001.png>
More information about the Colloquium
mailing list