Run DeepSeek R1 Locally with Ollama
How to run DeepSeek R1 (8B and 70B) locally on your own hardware using Ollama — hardware requirements, speed expectations, and tips.
Speed
20–28 tok/s (8B on M4 Pro)
Min Memory
8 GB
Software
Ollama
Hardware Used in This Guide
mini-pc · Check Price on Amazon
Step-by-Step Setup
- 01
Install Ollama
Ollama supports DeepSeek R1 out of the box via its model registry. Install on macOS, Linux, or Windows.
ollama --version
- 02
Pull DeepSeek R1
Choose the size that fits your hardware. The 8B model needs ~5 GB VRAM; the 70B needs ~40 GB (GPU + RAM offload).
# 8B — best for 8–16 GB systems ollama pull deepseek-r1:8b # 70B — best for 24 GB+ unified or multi-GPU ollama pull deepseek-r1:70b
- 03
Run with extended context
DeepSeek R1 uses chain-of-thought reasoning that produces long outputs. Increase context window for complex tasks.
ollama run deepseek-r1:8b --context 16384 "Solve: if x² + 2x - 8 = 0, find x"
- 04
Use the thinking tags
DeepSeek R1's outputs include <think> blocks showing the reasoning chain. These are normal — the final answer follows after.
Optimization Tips
- ›
DeepSeek R1 8B rivals GPT-4o on many reasoning benchmarks — it's the best local reasoning model at this size.
- ›
The <think> reasoning tokens count toward your context window — increase it for complex multi-step problems.
- ›
On Apple Silicon, R1 8B runs at ~20–28 tok/s; on RTX 5070, expect ~55–65 tok/s for the 8B variant.
- ›
R1 70B requires GPU offload on most consumer hardware — ensure ≥ 64 GB system RAM alongside your GPU.
Other Hardware for DeepSeek R1
gpu · Check Price on Amazon · 12 GB VRAM
Related Guides
Run Llama 3.3 70B on Mac Mini M4 Pro→
Complete guide to running Llama 3.3 70B (Q4) locally on the Mac Mini M4 Pro with 24 GB unified memory using Ollama.
Run Llama 3.1 70B on RTX 5070→
How to run Llama 3.1 70B (Q4) on an RTX 5070 12 GB using Ollama — includes VRAM limits, layer offload settings, and expected speed.