How fast does Llama 3.2 3B / Phi-4 run on GMKtec NucBox M5 Pro Mini PC?

Llama 3.2 3B / Phi-4 runs at 4–12 tok/s (CPU) on GMKtec NucBox M5 Pro Mini PC.

What software do I need to run Llama 3.2 3B / Phi-4 locally?

You need: Ollama, Linux or Windows 11.

Language Model3B–14B

Run Ollama on a Mini PC (Intel/AMD)

How to run local LLMs with Ollama on an Intel or AMD mini PC — best models for 16 GB RAM, performance expectations, and optimization.

Speed

4–12 tok/s (CPU)

Min Memory

16 GB

Software

Ollama, Linux or Windows 11

Hardware Used in This Guide

GMKtec NucBox M5 Pro Mini PC

mini-pc · Check Price on Amazon

Buy on AmazonAffiliate link — no extra cost to you

Step-by-Step Setup

01
Install Ollama
Mini PCs without discrete GPUs run Ollama in CPU-only mode. Ollama detects this automatically and uses AVX2/AVX-512 SIMD where available.
```
# Linux
curl -fsSL https://ollama.com/install.sh | sh
```

Choose a model that fits in RAM

Rule of thumb: model file size ≤ 60% of total RAM to leave headroom for OS and apps. On 16 GB, stick to Q4 models ≤ 8 GB (7B–8B parameter models).

# Best options for 16 GB mini PC:
ollama pull llama3.2:3b     # 2.0 GB — fastest
ollama pull phi4             # 9.1 GB — best quality
ollama pull llama3.1:8b     # 4.7 GB — balanced

03
Set thread count
CPU inference benefits from pinning threads to performance cores. Set OLLAMA_NUM_THREAD to your P-core count.
```
# Example: 8 P-cores
export OLLAMA_NUM_THREAD=8
ollama run llama3.2:3b
```

Install Open WebUI

Add a browser-based UI for a full chat experience.

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  ghcr.io/open-webui/open-webui:main

Optimization Tips

›
Mini PCs with Intel Core Ultra (Meteor Lake+) have integrated NPUs but Ollama does not use them — stick to CPU/GPU.
›
Phi-4 14B Q4 at 9 GB outperforms Llama 3.1 8B on reasoning tasks despite the similar file size.
›
RAM speed matters on mini PCs — DDR5-5600 gives ~30% more bandwidth than DDR4-3200, translating to faster tok/s.
›
For always-on home assistant use, configure Ollama as a systemd service so it survives reboots.

Other Hardware for Llama 3.2 3B / Phi-4

KAMRUI Hyper H2 Mini PC (Intel Core 14450HX)

mini-pc · Check Price on Amazon · 16 GB Unified

Buy on AmazonAffiliate link — no extra cost to you

GEEKOM IT12 Mini PC (Intel i5-12450H)

mini-pc · Check Price on Amazon · 16 GB Unified

Buy on AmazonAffiliate link — no extra cost to you

Related Guides

Run Llama 3.1 8B on Mac Mini M4→

Step-by-step guide to running Llama 3.1 8B locally on the Apple Mac Mini M4 using Ollama — no GPU required.

Run Llama 3.1 70B on RTX 5070→

How to run Llama 3.1 70B (Q4) on an RTX 5070 12 GB using Ollama — includes VRAM limits, layer offload settings, and expected speed.

← All Guides Browse Hardware →

Run Ollama on a Mini PC (Intel/AMD)

Step-by-Step Setup

Install Ollama

Choose a model that fits in RAM

Set thread count

Install Open WebUI

Optimization Tips

Other Hardware for Llama 3.2 3B / Phi-4

Related Guides