Run Ollama on a Mini PC (Intel/AMD)
How to run local LLMs with Ollama on an Intel or AMD mini PC — best models for 16 GB RAM, performance expectations, and optimization.
Speed
4–12 tok/s (CPU)
Min Memory
16 GB
Software
Ollama, Linux or Windows 11
Hardware Used in This Guide
mini-pc · Check Price on Amazon
Step-by-Step Setup
- 01
Install Ollama
Mini PCs without discrete GPUs run Ollama in CPU-only mode. Ollama detects this automatically and uses AVX2/AVX-512 SIMD where available.
# Linux curl -fsSL https://ollama.com/install.sh | sh
- 02
Choose a model that fits in RAM
Rule of thumb: model file size ≤ 60% of total RAM to leave headroom for OS and apps. On 16 GB, stick to Q4 models ≤ 8 GB (7B–8B parameter models).
# Best options for 16 GB mini PC: ollama pull llama3.2:3b # 2.0 GB — fastest ollama pull phi4 # 9.1 GB — best quality ollama pull llama3.1:8b # 4.7 GB — balanced
- 03
Set thread count
CPU inference benefits from pinning threads to performance cores. Set OLLAMA_NUM_THREAD to your P-core count.
# Example: 8 P-cores export OLLAMA_NUM_THREAD=8 ollama run llama3.2:3b
- 04
Install Open WebUI
Add a browser-based UI for a full chat experience.
docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \ ghcr.io/open-webui/open-webui:main
Optimization Tips
- ›
Mini PCs with Intel Core Ultra (Meteor Lake+) have integrated NPUs but Ollama does not use them — stick to CPU/GPU.
- ›
Phi-4 14B Q4 at 9 GB outperforms Llama 3.1 8B on reasoning tasks despite the similar file size.
- ›
RAM speed matters on mini PCs — DDR5-5600 gives ~30% more bandwidth than DDR4-3200, translating to faster tok/s.
- ›
For always-on home assistant use, configure Ollama as a systemd service so it survives reboots.
Other Hardware for Llama 3.2 3B / Phi-4
mini-pc · Check Price on Amazon · 16 GB Unified
mini-pc · Check Price on Amazon · 16 GB Unified
Related Guides
Run Llama 3.1 8B on Mac Mini M4→
Step-by-step guide to running Llama 3.1 8B locally on the Apple Mac Mini M4 using Ollama — no GPU required.
Run Llama 3.1 70B on RTX 5070→
How to run Llama 3.1 70B (Q4) on an RTX 5070 12 GB using Ollama — includes VRAM limits, layer offload settings, and expected speed.