Simon Mo on vLLM: Easy, Fast, and Cost-Effective LLM Serving for…

Join Simon Mo, a PhD student at Berkeley Sky Computing Lab, and Co-leader of the vLLM project as he shares insights at AMD Advancing AI 2025. This talk explores the vLLM project journey to create the fastest and easiest to use open-source LLM inference and serving engine. Simon discusses the collaboration with AMD, highlighting performance enhancements on the AMD Instinct™ MI300X GPU. Learn about the innovative scheduling framework, piecewise device graph, and various optimization techniques like prefix caching and speculative decoding. Discover how vLLM integrates with AMD ROCm™ software platform to achieve lower latency and higher throughput for LLMs. Find the resources you need to develop using AMD products: *** © 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Simon Mo on vLLM: Easy, Fast, and Cost-Effective LLM Serving for Everyone

Похожее видео