Memory & Storage

What is VRAM?

Video RAM — dedicated memory on a GPU. Determines the maximum model size you can run with full GPU acceleration. Once a model exceeds VRAM, it spills to system RAM over the slow PCIe bus.

Full Explanation

VRAM (Video RAM) is high-speed memory physically attached to a GPU die, operating at bandwidths up to 672 GB/s on the RTX 5070. Unlike system RAM, it shares a direct path to the GPU's compute cores, allowing the chip to load model weights at full speed. When a language model's weights exceed available VRAM, Ollama and llama.cpp automatically offload layers to system RAM — but those layers then travel over the PCIe bus at ~30–60 GB/s instead of hundreds, cutting tokens-per-second by 50–90%.

Why It Matters for Local AI

A 7B model at Q4 quantization requires ~4 GB of VRAM. A 13B model needs ~8 GB, and a 70B model needs ~40 GB. If you want to run a 13B model with full GPU acceleration, 12 GB VRAM is the minimum — the RTX 5070 sits right at that threshold.

Hardware Relevant to VRAM

GIGABYTE GeForce RTX 5070 WINDFORCE OC 12G

gpu · Check Price on Amazon · 12 GB VRAM · 672 GB/s

Buy on AmazonAffiliate link — no extra cost to you
ASUS Prime GeForce RTX 5070 SFF-Ready 12GB

gpu · Check Price on Amazon · 12 GB VRAM · 672 GB/s

Buy on AmazonAffiliate link — no extra cost to you
GIGABYTE Radeon RX 9060 XT GAMING OC 16G

gpu · Check Price on Amazon · 16 GB VRAM · 288 GB/s

Buy on AmazonAffiliate link — no extra cost to you

Related Terms