How-To6 min readApril 22, 2026By Alex Voss

Running DeepSeek R1 Locally: Which Hardware Can Handle It?

DeepSeek R1 went viral for matching GPT-4o reasoning at a fraction of the training cost. Now everyone wants to run it locally. The good news: the distilled variants (7B, 14B, 32B) run on consumer hardware. The bad news: the full 671B model needs enterprise-grade memory. Here's exactly what you need for each variant.

TL;DR: DeepSeek R1 7B runs on 8 GB VRAM, 14B needs 12 GB, 32B needs 24 GB. The full 671B model is cloud-only. Best local pick: R1 14B Q4 on an RTX 5070 — strong reasoning at 90+ t/s.

DeepSeek R1 Model Variants and Their Hardware Requirements

ModelParametersVRAM (Q4)Min HardwareSpeed (est.)
DeepSeek-R1-Distill-Qwen-1.5B1.5B~1 GBAny GPU or CPUFast on anything
DeepSeek-R1-Distill-Qwen-7B7B~4 GB6 GB+ VRAM40–120 t/s on GPU
DeepSeek-R1-Distill-Llama-8B8B~5 GB8 GB+ VRAM35–100 t/s on GPU
DeepSeek-R1-Distill-Qwen-14B14B~8 GB10 GB+ VRAM20–60 t/s on GPU
DeepSeek-R1-Distill-Qwen-32B32B~19 GB24 GB+ VRAM or 32 GB RAM5–20 t/s
DeepSeek-R1 (full)671B MoE~400 GB activeMultiple A100s or H100sNot feasible at home
Need hardware to run DeepSeek R1 locally? Mac Mini M4 Pro (64GB) handles R1 70B at 8–12 t/s. For Windows: RTX 5070 Windforce runs R1 8B and 14B at 100+ t/s. RX 9060 XT 16G fits R1 14B in VRAM with room to spare.
The distilled variants are trained to reproduce R1's reasoning behavior in a smaller model. DeepSeek-R1-Distill-Qwen-32B performs close to the full 671B model on most benchmarks — and it runs on consumer hardware.

Running DeepSeek R1 on Different Hardware

RTX 5070 (12 GB VRAM)

The RTX 5070 comfortably handles DeepSeek-R1-Distill-Qwen-14B at Q4 (8 GB, leaving 4 GB headroom for context). The 7B and 8B distills run fast — expect 80–120 tokens/second. The 32B distill requires CPU offloading: you'd run about 12 layers on GPU, rest on system RAM, landing at ~8–12 t/s.

Mac Mini M4 Pro (24 GB Unified Memory)

The Mac Mini M4 Pro is the best consumer option for DeepSeek R1 32B. With 24 GB unified memory and 273 GB/s bandwidth, it runs the 32B distill at roughly 15–20 tokens/second — fully usable for extended reasoning sessions. The 7B distill runs at ~65 t/s.

Mac Mini M4 (16 GB Unified Memory)

The base Mac Mini M4 handles the 7B and 8B distills well (40–50 t/s). The 14B distill runs at ~22 t/s with 16 GB. The 32B distill is too large — it would need to spill into swap, dropping performance to unusable levels.

Budget x86 Mini PCs (GMKtec, Kamrui)

The 7B distill runs on any mini PC — expect 8–12 t/s on GMKtec NucBox M5 Pro, which is slow but usable for non-interactive tasks. The 14B distill is uncomfortable. If you're on x86 and want DeepSeek R1 quality, the 7B distill is your only comfortable option.

How to Run DeepSeek R1 with Ollama

bash
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Pull DeepSeek R1 distill — start with 7B
ollama pull deepseek-r1:7b

# Run it
ollama run deepseek-r1:7b

# For 14B (needs 10+ GB VRAM)
ollama pull deepseek-r1:14b

# For 32B (needs 24+ GB RAM or unified memory)
ollama pull deepseek-r1:32b

DeepSeek R1 vs Other Local Models — Is It Worth It?

ModelSizeReasoning QualitySpeed (7B equiv.)Best For
DeepSeek-R1-Distill 7B7B★★★★☆HighFast reasoning tasks
DeepSeek-R1-Distill 32B32B★★★★★MediumComplex reasoning
Llama 3.3 70B Q470B★★★★☆LowGeneral capability
Qwen2.5 14B14B★★★★☆HighBalanced performance
Mistral 7B7B★★★☆☆HighFast, general use

DeepSeek R1 distills are best for tasks requiring explicit reasoning chains — math problems, logical deduction, code debugging. For general conversation and writing, Llama 3.3 or Qwen2.5 may feel more natural. Run both and see what fits your workflow.

Can You Run the Full DeepSeek R1 671B at Home?

Technically possible with extreme setups — but not practical. The full 671B model uses a Mixture-of-Experts architecture where only ~37B parameters are active per token. But loading the full model requires roughly 400 GB of memory (at Q4 quantization). That means either 16× 32 GB GPUs or a machine with 512 GB+ of RAM running at CPU speeds.

The 32B distill gets you 90%+ of the reasoning quality at 1/20th the hardware requirement. The full 671B model is not worth chasing for home use in 2026.

Frequently Asked Questions

Q1What is the best DeepSeek R1 model to run locally?

DeepSeek-R1-Distill-Qwen-32B is the best quality you can get on consumer hardware. If you have 24+ GB of unified memory (Mac Mini M4 Pro) or 32+ GB system RAM, it runs at usable speeds. For GPU setups with 12 GB VRAM, the 14B distill is the sweet spot — fits fully in VRAM and delivers excellent reasoning.

Q2How long does DeepSeek R1 take to think before answering?

R1 generates a 'thinking' section before its final answer — this can be 200–2000 tokens. At 65 t/s (Mac Mini M4 Pro), a medium reasoning chain takes 3–30 seconds before the answer starts. At 12 t/s (budget PC), the same chain takes 16–160 seconds. Faster hardware makes R1 much more pleasant to use.

Q3Is DeepSeek R1 safe to run locally?

The distilled models are open-source weights published on Hugging Face. Running them locally means your prompts never leave your machine — there's no telemetry or data collection in the model weights themselves. This is one of the key reasons to prefer local inference with models like DeepSeek R1.

Q4How fast does DeepSeek R1 run on a Mac Mini M4 Pro?

On the Mac Mini M4 Pro (64GB), DeepSeek R1 70B at Q4 quantization runs at approximately 8–12 tokens/second using Ollama. Because R1 uses chain-of-thought reasoning, it generates internal 'thinking' tokens before the answer — a single response may produce 500–2000 thinking tokens plus the output. A complex reasoning task can take 2–5 minutes to fully complete, which is normal behavior.

Q5What hardware do I need to run DeepSeek R1 8B locally?

DeepSeek R1 8B is the most accessible size. You need: minimum 8GB RAM for Q4 quantization (runs via CPU), or any GPU with 8GB VRAM for GPU-accelerated inference. On a Mac Mini M4 (16GB), it runs at approximately 35–45 t/s. On an RTX 5070 (12GB VRAM), it runs at approximately 100+ t/s. The 8B version is a good starting point for R1 experimentation on consumer hardware.

Q6Is DeepSeek R1 better than Llama 3 for reasoning tasks?

For mathematical reasoning, coding, and multi-step logic problems, yes — R1 was specifically trained with reinforcement learning for reasoning and outperforms Llama 3 on most benchmarks that require step-by-step problem solving. For creative writing, general chat, and instruction following, Llama 3.1 is competitive. R1's chain-of-thought process makes it slower but often more accurate on hard problems.

Q7Can I run DeepSeek R1 on an RTX 5070 with 12GB VRAM?

DeepSeek R1 8B fits in 12GB VRAM at Q4 quantization (~5GB) and runs at 100+ t/s — excellent performance. DeepSeek R1 14B also fits (~8GB at Q4). DeepSeek R1 70B does not fit in 12GB and requires either a multi-GPU setup, a system with large RAM for CPU offload, or the Mac Mini M4 Pro with 64GB unified memory. For 70B R1 on a single card, you'd need 24GB+ VRAM.

Q8How do I run DeepSeek R1 with Ollama?

Install Ollama, then run: `ollama pull deepseek-r1:8b` (or `:14b`, `:70b` for larger variants). Start a chat with `ollama run deepseek-r1:8b`. You'll see the model's internal thinking process in `<think>...</think>` blocks before the answer — this is normal and intentional. For API access, query `http://localhost:11434/api/chat` with the model name `deepseek-r1:8b`.

Related Articles