Question 1

What is GGUF?

Accepted Answer

GGUF (GPT-Generated Unified Format) is the model file format introduced by llama.cpp in 2023, now the universal container for quantized open-source LLMs. A GGUF file encodes the model architecture, vocabulary, tokenizer, quantization level, and all weight tensors in a single binary file. Ollama downloads and manages GGUF files behind the scenes; if you download models manually from Hugging Face, you'll typically choose between variants like "Llama-3.1-8B-Q4_K_M.gguf" (smaller, faster) and "Llama-3.1-8B-Q8_0.gguf" (larger, higher quality).

Question 2

Why does GGUF matter for local AI?

Accepted Answer

GGUF files are self-contained and hardware-agnostic — the same file runs on Apple Silicon, NVIDIA, AMD, and CPU. The filename encodes the quantization level: Q4_K_M is the community sweet spot for balance. Q8_0 requires double the VRAM for modestly better output quality.

What is GGUF?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to GGUF

Related Terms