Войти
  • 1070Просмотров
  • 4 месяца назадОпубликованоPavlo Khmel HPC

vLLM and Ray cluster to start LLM on multiple servers with multiple GPUs

This video shows how to start (inference) large language models (LLMs) like DeepSeek-R1 on multiple computers (servers) with multiple NVIDIA A100 GPUs (80GB). In this example, the installation is done on four servers with 16 GPUs. Operating system Rocky Linux 9.6. Step-by-step: - Installation of Ray Cluster - Installation of vLLM Mentioned in this video: Ollama, PocketPal AI, model distillation and quantization. Commands are in the text version of this video: