Join Simon Mo, a PhD student at Berkeley Sky Computing Lab, and Co-leader of the vLLM project as he shares insights at AMD Advancing AI 2025. This talk explores the vLLM project journey to create the fastest and easiest to use open-source LLM inference and serving engine. Simon discusses the collaboration with AMD, highlighting performance enhancements on the AMD Instinct™ MI300X GPU. Learn about the innovative scheduling framework, piecewise device graph, and various optimization techniques like prefix caching and speculative decoding. Discover how vLLM integrates with AMD ROCm™ software platform to achieve lower latency and higher throughput for LLMs. Find the resources you need to develop using AMD products: *** © 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.











