Question 1

What is PCIe?

Accepted Answer

PCIe (Peripheral Component Interconnect Express) is the high-speed serial interface connecting discrete GPUs to the CPU and system memory. PCIe 4.0 x16 provides ~32 GB/s of bidirectional bandwidth; PCIe 5.0 doubles this to ~64 GB/s. For LLM inference within VRAM, PCIe speed is irrelevant. It only matters when models overflow VRAM and must stream layers from system RAM — at which point PCIe bandwidth (32–64 GB/s) becomes the bottleneck rather than GPU memory bandwidth (672 GB/s for GDDR7).

Question 2

Why does PCIe matter for local AI?

Accepted Answer

If your model fits in VRAM, PCIe generation is irrelevant for inference speed. If you're intentionally offloading some layers to CPU RAM (e.g., running a 70B model on a 12 GB GPU), PCIe 4.0+ minimizes the penalty for those offloaded layers.

What is PCIe?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to PCIe

Related Terms