How fast does DeepSeek R1 run on Apple Mac Mini (M4 Pro, 2024)?

DeepSeek R1 runs at 20–28 tok/s (8B on M4 Pro) on Apple Mac Mini (M4 Pro, 2024).

Language Model8B / 70B

Run DeepSeek R1 Locally with Ollama

How to run DeepSeek R1 (8B and 70B) locally on your own hardware using Ollama — hardware requirements, speed expectations, and tips.

Speed

20–28 tok/s (8B on M4 Pro)

Min Memory

8 GB

Software

Ollama

Hardware Used in This Guide

mini-pc · Check Price on Amazon

Buy on AmazonAffiliate link — no extra cost to you

Choose the size that fits your hardware. The 8B model needs ~5 GB VRAM; the 70B needs ~40 GB (GPU + RAM offload).

# 8B — best for 8–16 GB systems
ollama pull deepseek-r1:8b

# 70B — best for 24 GB+ unified or multi-GPU
ollama pull deepseek-r1:70b

03
Run with extended context
DeepSeek R1 uses chain-of-thought reasoning that produces long outputs. Increase context window for complex tasks.
```
ollama run deepseek-r1:8b --context 16384 "Solve: if x² + 2x - 8 = 0, find x"
```
04
Use the thinking tags
DeepSeek R1's outputs include <think> blocks showing the reasoning chain. These are normal — the final answer follows after.

›
DeepSeek R1 8B rivals GPT-4o on many reasoning benchmarks — it's the best local reasoning model at this size.
›
The <think> reasoning tokens count toward your context window — increase it for complex multi-step problems.
›
On Apple Silicon, R1 8B runs at ~20–28 tok/s; on RTX 5070, expect ~55–65 tok/s for the 8B variant.
›
R1 70B requires GPU offload on most consumer hardware — ensure ≥ 64 GB system RAM alongside your GPU.

gpu · Check Price on Amazon · 12 GB VRAM

Buy on AmazonAffiliate link — no extra cost to you

Run Llama 3.3 70B on Mac Mini M4 Pro→

Complete guide to running Llama 3.3 70B (Q4) locally on the Mac Mini M4 Pro with 24 GB unified memory using Ollama.

Run Llama 3.1 70B on RTX 5070→

How to run Llama 3.1 70B (Q4) on an RTX 5070 12 GB using Ollama — includes VRAM limits, layer offload settings, and expected speed.