How-To6 min readApril 22, 2026By Alex Voss

Running DeepSeek R1 Locally: Which Hardware Can Handle It?

DeepSeek R1 went viral for matching GPT-4o reasoning at a fraction of the training cost. Now everyone wants to run it locally. The good news: the distilled variants (7B, 14B, 32B) run on consumer hardware. The bad news: the full 671B model needs enterprise-grade memory. Here's exactly what you need for each variant.

◆

TL;DR: DeepSeek R1 7B runs on 8 GB VRAM, 14B needs 12 GB, 32B needs 24 GB. The full 671B model is cloud-only. Best local pick: R1 14B Q4 on an RTX 5070 — strong reasoning at 90+ t/s.

DeepSeek R1 Model Variants and Their Hardware Requirements

Model	Parameters	VRAM (Q4)	Min Hardware	Speed (est.)
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	~1 GB	Any GPU or CPU	Fast on anything
DeepSeek-R1-Distill-Qwen-7B	7B	~4 GB	6 GB+ VRAM	40–120 t/s on GPU
DeepSeek-R1-Distill-Llama-8B	8B	~5 GB	8 GB+ VRAM	35–100 t/s on GPU
DeepSeek-R1-Distill-Qwen-14B	14B	~8 GB	10 GB+ VRAM	20–60 t/s on GPU
DeepSeek-R1-Distill-Qwen-32B	32B	~19 GB	24 GB+ VRAM or 32 GB RAM	5–20 t/s
DeepSeek-R1 (full)	671B MoE	~400 GB active	Multiple A100s or H100s	Not feasible at home

◈

Need hardware to run DeepSeek R1 locally? Mac Mini M4 Pro (64GB) handles R1 70B at 8–12 t/s. For Windows: RTX 5070 Windforce runs R1 8B and 14B at 100+ t/s. RX 9060 XT 16G fits R1 14B in VRAM with room to spare.

◈

The distilled variants are trained to reproduce R1's reasoning behavior in a smaller model. DeepSeek-R1-Distill-Qwen-32B performs close to the full 671B model on most benchmarks — and it runs on consumer hardware.

Running DeepSeek R1 on Different Hardware

RTX 5070 (12 GB VRAM)

The RTX 5070 comfortably handles DeepSeek-R1-Distill-Qwen-14B at Q4 (8 GB, leaving 4 GB headroom for context). The 7B and 8B distills run fast — expect 80–120 tokens/second. The 32B distill requires CPU offloading: you'd run about 12 layers on GPU, rest on system RAM, landing at ~8–12 t/s.

Mac Mini M4 Pro (24 GB Unified Memory)

The Mac Mini M4 Pro is the best consumer option for DeepSeek R1 32B. With 24 GB unified memory and 273 GB/s bandwidth, it runs the 32B distill at roughly 15–20 tokens/second — fully usable for extended reasoning sessions. The 7B distill runs at ~65 t/s.

Mac Mini M4 (16 GB Unified Memory)

The base Mac Mini M4 handles the 7B and 8B distills well (40–50 t/s). The 14B distill runs at ~22 t/s with 16 GB. The 32B distill is too large — it would need to spill into swap, dropping performance to unusable levels.

Budget x86 Mini PCs (GMKtec, Kamrui)

The 7B distill runs on any mini PC — expect 8–12 t/s on GMKtec NucBox M5 Pro, which is slow but usable for non-interactive tasks. The 14B distill is uncomfortable. If you're on x86 and want DeepSeek R1 quality, the 7B distill is your only comfortable option.

How to Run DeepSeek R1 with Ollama

bash

# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Pull DeepSeek R1 distill — start with 7B
ollama pull deepseek-r1:7b

# Run it
ollama run deepseek-r1:7b

# For 14B (needs 10+ GB VRAM)
ollama pull deepseek-r1:14b

# For 32B (needs 24+ GB RAM or unified memory)
ollama pull deepseek-r1:32b

DeepSeek R1 vs Other Local Models — Is It Worth It?

Model	Size	Reasoning Quality	Speed (7B equiv.)	Best For
DeepSeek-R1-Distill 7B	7B	★★★★☆	High	Fast reasoning tasks
DeepSeek-R1-Distill 32B	32B	★★★★★	Medium	Complex reasoning
Llama 3.3 70B Q4	70B	★★★★☆	Low	General capability
Qwen2.5 14B	14B	★★★★☆	High	Balanced performance
Mistral 7B	7B	★★★☆☆	High	Fast, general use

DeepSeek R1 distills are best for tasks requiring explicit reasoning chains — math problems, logical deduction, code debugging. For general conversation and writing, Llama 3.3 or Qwen2.5 may feel more natural. Run both and see what fits your workflow.

Can You Run the Full DeepSeek R1 671B at Home?

Technically possible with extreme setups — but not practical. The full 671B model uses a Mixture-of-Experts architecture where only ~37B parameters are active per token. But loading the full model requires roughly 400 GB of memory (at Q4 quantization). That means either 16× 32 GB GPUs or a machine with 512 GB+ of RAM running at CPU speeds.

The 32B distill gets you 90%+ of the reasoning quality at 1/20th the hardware requirement. The full 671B model is not worth chasing for home use in 2026.

Frequently Asked Questions

Q1What is the best DeepSeek R1 model to run locally?

DeepSeek-R1-Distill-Qwen-32B is the best quality you can get on consumer hardware. If you have 24+ GB of unified memory (Mac Mini M4 Pro) or 32+ GB system RAM, it runs at usable speeds. For GPU setups with 12 GB VRAM, the 14B distill is the sweet spot — fits fully in VRAM and delivers excellent reasoning.

Q2How long does DeepSeek R1 take to think before answering?

R1 generates a 'thinking' section before its final answer — this can be 200–2000 tokens. At 65 t/s (Mac Mini M4 Pro), a medium reasoning chain takes 3–30 seconds before the answer starts. At 12 t/s (budget PC), the same chain takes 16–160 seconds. Faster hardware makes R1 much more pleasant to use.

Q3Is DeepSeek R1 safe to run locally?

The distilled models are open-source weights published on Hugging Face. Running them locally means your prompts never leave your machine — there's no telemetry or data collection in the model weights themselves. This is one of the key reasons to prefer local inference with models like DeepSeek R1.

Q4How fast does DeepSeek R1 run on a Mac Mini M4 Pro?

On the Mac Mini M4 Pro (64GB), DeepSeek R1 70B at Q4 quantization runs at approximately 8–12 tokens/second using Ollama. Because R1 uses chain-of-thought reasoning, it generates internal 'thinking' tokens before the answer — a single response may produce 500–2000 thinking tokens plus the output. A complex reasoning task can take 2–5 minutes to fully complete, which is normal behavior.

Q5What hardware do I need to run DeepSeek R1 8B locally?

DeepSeek R1 8B is the most accessible size. You need: minimum 8GB RAM for Q4 quantization (runs via CPU), or any GPU with 8GB VRAM for GPU-accelerated inference. On a Mac Mini M4 (16GB), it runs at approximately 35–45 t/s. On an RTX 5070 (12GB VRAM), it runs at approximately 100+ t/s. The 8B version is a good starting point for R1 experimentation on consumer hardware.

Q6Is DeepSeek R1 better than Llama 3 for reasoning tasks?

For mathematical reasoning, coding, and multi-step logic problems, yes — R1 was specifically trained with reinforcement learning for reasoning and outperforms Llama 3 on most benchmarks that require step-by-step problem solving. For creative writing, general chat, and instruction following, Llama 3.1 is competitive. R1's chain-of-thought process makes it slower but often more accurate on hard problems.

Q7Can I run DeepSeek R1 on an RTX 5070 with 12GB VRAM?

DeepSeek R1 8B fits in 12GB VRAM at Q4 quantization (~5GB) and runs at 100+ t/s — excellent performance. DeepSeek R1 14B also fits (~8GB at Q4). DeepSeek R1 70B does not fit in 12GB and requires either a multi-GPU setup, a system with large RAM for CPU offload, or the Mac Mini M4 Pro with 64GB unified memory. For 70B R1 on a single card, you'd need 24GB+ VRAM.

Q8How do I run DeepSeek R1 with Ollama?

Install Ollama, then run: `ollama pull deepseek-r1:8b` (or `:14b`, `:70b` for larger variants). Start a chat with `ollama run deepseek-r1:8b`. You'll see the model's internal thinking process in `<think>...</think>` blocks before the answer — this is normal and intentional. For API access, query `http://localhost:11434/api/chat` with the model name `deepseek-r1:8b`.

How-To

Run Stable Diffusion Locally in 2026

How-To

How to Install Ollama and Run Local LLMs — Full Setup Guide (2026)

How-To

Run Llama 3 Locally: Hardware Requirements and Setup Guide

DeepSeek R1 Model Variants and Their Hardware Requirements

Running DeepSeek R1 on Different Hardware

RTX 5070 (12 GB VRAM)

Mac Mini M4 Pro (24 GB Unified Memory)

Mac Mini M4 (16 GB Unified Memory)

Budget x86 Mini PCs (GMKtec, Kamrui)

How to Run DeepSeek R1 with Ollama

DeepSeek R1 vs Other Local Models — Is It Worth It?

Can You Run the Full DeepSeek R1 671B at Home?

Frequently Asked Questions

Q1What is the best DeepSeek R1 model to run locally?

Q2How long does DeepSeek R1 take to think before answering?

Q3Is DeepSeek R1 safe to run locally?

Q4How fast does DeepSeek R1 run on a Mac Mini M4 Pro?

Q5What hardware do I need to run DeepSeek R1 8B locally?

Q6Is DeepSeek R1 better than Llama 3 for reasoning tasks?

Q7Can I run DeepSeek R1 on an RTX 5070 with 12GB VRAM?

Q8How do I run DeepSeek R1 with Ollama?

Related Articles