Mac Studio M3 Ultra: what size LLMs will it be able to run and how…

The Mac Studio M3 Ultra just landed in 2025! It will feature up to 512 GB of RAM and 800 GB/s memory bandwidth. In this video, I make some predictions about what the size of LLM will be which you can fit into this machine, and what speeds you will be able to run them at. Comment from the future, whether my predictions were right or not. Spoiler alert: do not get your hopes up too high for really big models like DeepSeek R1 natively and at full model size - it will not be able to fit into the RAM without quantization! Also, I predict that DeepSeek R1 will not run at decent speeds (10+ token/s) - otherwise I promise to publicly eat my hat here on this channel. Use the LLM RAM Calculator to estimate the size of your LLMs according with the size of RAM you have: This useful article charts different inference options you have today, including Apple silicon: on Apple Silicon M-series inference performance: Mac Studio 2022 (M1) specs: Mac Studio 2023 (M2) specs: Mac Studio 2025 (M3 Ultra / M4 Max) specs: Exo (Clustering Macs together for inference): Let me know in the comments what burning questions you want to get answered! Subscribe to my channel for more tips on AI for managers, entrepreneurs and business people, upcoming AI tools which will save you time and make you money.

Mac Studio M3 Ultra: what size LLMs will it be able to run and how fast?

Похожее видео