What is PCIe?
Peripheral Component Interconnect Express — the bus connecting a discrete GPU to the motherboard. PCIe 4.0 or 5.0 needed for fast model offloading when VRAM is exceeded.
Full Explanation
PCIe (Peripheral Component Interconnect Express) is the high-speed serial interface connecting discrete GPUs to the CPU and system memory. PCIe 4.0 x16 provides ~32 GB/s of bidirectional bandwidth; PCIe 5.0 doubles this to ~64 GB/s. For LLM inference within VRAM, PCIe speed is irrelevant. It only matters when models overflow VRAM and must stream layers from system RAM — at which point PCIe bandwidth (32–64 GB/s) becomes the bottleneck rather than GPU memory bandwidth (672 GB/s for GDDR7).
Why It Matters for Local AI
If your model fits in VRAM, PCIe generation is irrelevant for inference speed. If you're intentionally offloading some layers to CPU RAM (e.g., running a 70B model on a 12 GB GPU), PCIe 4.0+ minimizes the penalty for those offloaded layers.
Hardware Relevant to PCIe
gpu · Check Price on Amazon · 12 GB VRAM · 672 GB/s
gpu · Check Price on Amazon · 16 GB VRAM · 288 GB/s
Related Terms
VRAM→
Video RAM — dedicated memory on a GPU. Determines the maximum model size you can run with full GPU acceleration. Once a model exceeds VRAM, it spills to system RAM over the slow PCIe bus.
Memory Bandwidth→
How fast data moves between memory and the processor, measured in GB/s. Tokens per second scales nearly linearly with bandwidth — this is the single most important GPU spec for LLM speed.
CPU Inference→
Running LLMs on the CPU rather than a GPU. Works on any hardware, no special drivers needed. Limited to ~8–12 t/s on 7B models — fine for background tasks, slow for interactive use.