This video shows how to start (inference) large language models (LLMs) like DeepSeek-R1 on multiple computers (servers) with multiple NVIDIA A100 GPUs (80GB). In this example, the installation is done on four servers with 16 GPUs. Operating system Rocky Linux 9.6. Step-by-step: - Installation of Ray Cluster - Installation of vLLM Mentioned in this video: Ollama, PocketPal AI, model distillation and quantization. Commands are in the text version of this video:











