Language Model8B / 70B

Run DeepSeek R1 Locally with Ollama

How to run DeepSeek R1 (8B and 70B) locally on your own hardware using Ollama — hardware requirements, speed expectations, and tips.

Speed

20–28 tok/s (8B on M4 Pro)

Min Memory

8 GB

Software

Ollama

Hardware Used in This Guide

Apple Mac Mini (M4 Pro, 2024)

mini-pc · Check Price on Amazon

Buy on AmazonAffiliate link — no extra cost to you

Step-by-Step Setup

  1. 01

    Install Ollama

    Ollama supports DeepSeek R1 out of the box via its model registry. Install on macOS, Linux, or Windows.

    ollama --version
  2. 02

    Pull DeepSeek R1

    Choose the size that fits your hardware. The 8B model needs ~5 GB VRAM; the 70B needs ~40 GB (GPU + RAM offload).

    # 8B — best for 8–16 GB systems
    ollama pull deepseek-r1:8b
    
    # 70B — best for 24 GB+ unified or multi-GPU
    ollama pull deepseek-r1:70b
  3. 03

    Run with extended context

    DeepSeek R1 uses chain-of-thought reasoning that produces long outputs. Increase context window for complex tasks.

    ollama run deepseek-r1:8b --context 16384 "Solve: if x² + 2x - 8 = 0, find x"
  4. 04

    Use the thinking tags

    DeepSeek R1's outputs include <think> blocks showing the reasoning chain. These are normal — the final answer follows after.

Optimization Tips

  • DeepSeek R1 8B rivals GPT-4o on many reasoning benchmarks — it's the best local reasoning model at this size.

  • The <think> reasoning tokens count toward your context window — increase it for complex multi-step problems.

  • On Apple Silicon, R1 8B runs at ~20–28 tok/s; on RTX 5070, expect ~55–65 tok/s for the 8B variant.

  • R1 70B requires GPU offload on most consumer hardware — ensure ≥ 64 GB system RAM alongside your GPU.

Other Hardware for DeepSeek R1

GIGABYTE GeForce RTX 5070 WINDFORCE OC 12G

gpu · Check Price on Amazon · 12 GB VRAM

Buy on AmazonAffiliate link — no extra cost to you

Related Guides