Question 1

What is MoE?

Accepted Answer

Mixture of Experts (MoE) splits a model's feed-forward layers into multiple "expert" sub-networks, routing each token through only 2–4 of them instead of the full network. This means a 141B-parameter MoE model like DeepSeek-V3 activates only ~37B parameters per token — giving near-141B quality at roughly 37B inference cost in compute. However, all 141B parameters must still reside in memory, requiring massive VRAM or unified memory for full GPU acceleration.

Question 2

Why does MoE matter for local AI?

Accepted Answer

MoE models are memory-hungry but compute-efficient. A 46B MoE model like Mixtral 8x7B requires ~48 GB of memory for full acceleration — making the Mac Mini M4 Pro with 48 GB unified memory one of the few sub-$2,000 systems that can run it. Smaller MoE variants (e.g., DeepSeek-R1 distills) are more accessible.

What is MoE?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to MoE

Related Terms