How fast does Llama 3.3 70B run on Apple Mac Mini (M4 Pro, 2024)?

Llama 3.3 70B runs at 8–12 tok/s on Apple Mac Mini (M4 Pro, 2024).

Language Model70B

Run Llama 3.3 70B on Mac Mini M4 Pro

Complete guide to running Llama 3.3 70B (Q4) locally on the Mac Mini M4 Pro with 24 GB unified memory using Ollama.

Speed

8–12 tok/s

Min Memory

24 GB

Software

Ollama, macOS 14+

Hardware Used in This Guide

mini-pc · Check Price on Amazon

Buy on AmazonAffiliate link — no extra cost to you

›
The M4 Pro's 273 GB/s memory bandwidth is why 70B is viable — bandwidth matters more than raw FLOPS for LLM inference.
›
If you hit memory pressure, close Safari and other heavy apps; macOS will reclaim RAM for Ollama automatically.
›
Llama 3.3 70B outperforms Llama 3.1 405B on many benchmarks — the 70B sweet spot is real.
›
Set `OLLAMA_NUM_PARALLEL=1` to dedicate all memory to one request for maximum single-session speed.

mini-pc · Check Price on Amazon · 16 GB Unified

Buy on AmazonAffiliate link — no extra cost to you

Run Llama 3.1 8B on Mac Mini M4→

Step-by-step guide to running Llama 3.1 8B locally on the Apple Mac Mini M4 using Ollama — no GPU required.

Run Stable Diffusion on Mac Mini M4→

How to run SDXL and FLUX on the Mac Mini M4 using Diffusers or ComfyUI — with expected generation times and optimization tips.