Software & Frameworks

What is GGUF?

The standard file format for quantized LLMs used by llama.cpp and Ollama. Replaces the older GGML format. Stores model weights and metadata in a single portable file.

Full Explanation

GGUF (GPT-Generated Unified Format) is the model file format introduced by llama.cpp in 2023, now the universal container for quantized open-source LLMs. A GGUF file encodes the model architecture, vocabulary, tokenizer, quantization level, and all weight tensors in a single binary file. Ollama downloads and manages GGUF files behind the scenes; if you download models manually from Hugging Face, you'll typically choose between variants like "Llama-3.1-8B-Q4_K_M.gguf" (smaller, faster) and "Llama-3.1-8B-Q8_0.gguf" (larger, higher quality).

Why It Matters for Local AI

GGUF files are self-contained and hardware-agnostic — the same file runs on Apple Silicon, NVIDIA, AMD, and CPU. The filename encodes the quantization level: Q4_K_M is the community sweet spot for balance. Q8_0 requires double the VRAM for modestly better output quality.

Hardware Relevant to GGUF

GIGABYTE GeForce RTX 5070 WINDFORCE OC 12G

gpu · Check Price on Amazon · 12 GB VRAM · 672 GB/s

Buy on AmazonAffiliate link — no extra cost to you
Apple Mac Mini (M4, 2024)

mini-pc · Check Price on Amazon · 16 GB Unified · 120 GB/s

Buy on AmazonAffiliate link — no extra cost to you

Related Terms