What is NVMe SSD?
High-speed solid-state storage using the PCIe bus. Affects how quickly models load into memory at startup — a PCIe 4.0 NVMe loads a 7B model in ~2 seconds vs ~15 seconds on SATA SSD.
Full Explanation
NVMe (Non-Volatile Memory Express) SSDs connect directly to the PCIe bus, delivering sequential read speeds of 5,000–14,000 MB/s versus 500–600 MB/s for SATA SSDs. For local AI, NVMe speed affects model load time — the duration between starting Ollama with a new model and generating the first token. A 4 GB Q4 7B model loads in roughly 1–2 seconds on a PCIe 4.0 NVMe, 3–5 seconds on PCIe 3.0, and 12–18 seconds on SATA SSD. Once loaded, inference speed is determined by VRAM/RAM bandwidth, not storage.
Why It Matters for Local AI
NVMe matters if you frequently switch between models — a common pattern when running multiple specialized models for different tasks. For users who load one model and keep it running, SATA SSD is adequate. Mini PCs and Macs universally ship with NVMe; verify PCIe generation (4.0 preferred) when comparing storage specs.
Hardware Relevant to NVMe SSD
accessory · Check Price on Amazon
accessory · Check Price on Amazon
accessory · Check Price on Amazon
Related Terms
PCIe→
Peripheral Component Interconnect Express — the bus connecting a discrete GPU to the motherboard. PCIe 4.0 or 5.0 needed for fast model offloading when VRAM is exceeded.
VRAM→
Video RAM — dedicated memory on a GPU. Determines the maximum model size you can run with full GPU acceleration. Once a model exceeds VRAM, it spills to system RAM over the slow PCIe bus.
Unified Memory→
Apple Silicon uses a single pool of fast RAM shared between CPU and GPU. Larger unified memory = larger models run entirely at full bandwidth — no PCIe bottleneck.
CPU Inference→
Running LLMs on the CPU rather than a GPU. Works on any hardware, no special drivers needed. Limited to ~8–12 t/s on 7B models — fine for background tasks, slow for interactive use.