Methodology
How We Test AI Hardware
Every benchmark on The AI Desk follows a fixed protocol. We test on real consumer hardware using the same software and settings as our readers — no special drivers, no manufacturer samples, no cherry-picked runs.
Testing Principles
- ✓
Stock configuration
No overclocking, no custom BIOS settings, no power limit unlocks. We test what you get out of the box.
- ✓
Real workloads
We use the same tools our readers use: Ollama, llama.cpp, ComfyUI. Not synthetic GPU benchmarks.
- ✓
Consistent environment
Tests run with background apps closed. System idle for 10 minutes before benchmarking. 3 runs, median reported.
- ✓
Cross-referenced results
Our numbers are validated against community benchmarks from r/LocalLLaMA and llama.cpp issue threads. Outliers are investigated, not published.
- ✓
Version transparency
Every result is tagged with the model, quantization format, tool version, and test date.
Benchmark Definitions
Llama 3.1 8B (Q4_K_M)
tokens/second (t/s)Tool: Ollama 0.3+
Run `ollama run llama3.1:8b` with a 500-token prompt, measure generation speed via `ollama ps` throughput. 3 runs, median reported.
Llama 3.1 13B (Q4_K_M)
tokens/second (t/s)Tool: Ollama 0.3+
Same method as 8B. Only reported for hardware with sufficient VRAM/RAM to load the model fully without CPU offload.
Stable Diffusion XL (SDXL)
seconds per image (1024×1024)Tool: ComfyUI
20-step DPM++ 2M Karras, 1024×1024, no ControlNet. 3 runs, median reported. GPU-accelerated path only.
FLUX.1-dev
seconds per image (1024×1024)Tool: ComfyUI
20 steps, 1024×1024, fp8 checkpoint. Reported only where VRAM ≥ 12GB. 3 runs, median.
What We Don't Test
- ✕Training performance — all results are inference only. Training requires different hardware priorities.
- ✕Multi-GPU setups — all benchmarks are single-GPU or single-system. No NVLink, no tensor parallelism.
- ✕CPU-only inference for GPU products — if a GPU is reviewed, we only report GPU-accelerated inference.
- ✕API-based models — no cloud inference times. Local hardware only.
Data Sources
Some products are benchmarked by us directly. Others use verified community data from:
- →r/LocalLLaMA: Community benchmark megathreads for new hardware launches
- →llama.cpp GitHub Issues: Official performance tracking and regression tests
- →Simon Willison's Weblog: Independent Apple Silicon AI benchmarks
- →Manufacturer spec sheets: Memory bandwidth, TDP, and core count — taken as given, not independently verified
Update Policy
Benchmark results are updated when a new driver or software version causes a meaningful performance change (≥10%). Each product page shows a "Last Updated" date. If you notice a benchmark that no longer matches your real-world results, the most likely explanation is a driver update — check the date and compare to your software version.