Vulkan is the easiest way to run LLMs locally on your GPU while still getting great performance. Although there are faster methods for Nvidia such as ExLlamaV2, using Vulkan is easier and is the best choice for AMD GPUs. I used a RX 9060 XT 16GB with CachyOS to demo it, but this will work on any Linux distro and there are also versions for Windows and Mac. LLM and other AI models can be found at Here's the command used in the video: ./llama-server -hf unsloth/gemma-3-27b-it-GGUF:Q3_K_S -fa on -ngl 100 Check out my AI/ML playlist: These are affiliate links where I earn a small commission for purchases at no extra cost to you. This is the easiest way to help the channel, thank you! Amazon: Website: Donations Buy me a coffee: Chapters: 00:00 Intro 01:49 Downloading Vulkan 02:39 Choosing a model 05:04 Running the model 10:08 Other helpful tips 11:50 Outro











