Software & Frameworks

What is LoRA?

Low-Rank Adaptation — a fine-tuning technique that trains a tiny set of adapter weights instead of the full model. Runs on consumer GPUs with as little as 8 GB VRAM.

Full Explanation

LoRA (Low-Rank Adaptation) inserts small trainable matrices into a frozen base model's attention layers, updating only 0.1–1% of total parameters during training. This makes fine-tuning accessible on consumer hardware: a 7B model fine-tune with LoRA requires roughly 10–14 GB VRAM versus 80+ GB for full fine-tuning. The resulting adapter file is typically 50–500 MB and is applied on top of the base model at inference time using tools like Ollama, llama.cpp, or LM Studio.

Why It Matters for Local AI

LoRA is how most hobbyists customize models — teaching a 7B base model a specific writing style, domain vocabulary, or task behavior. An RTX 5070 with 12 GB VRAM is the minimum comfortable GPU for LoRA training on 7B models. The RTX 5080 with 16 GB handles 13B comfortably.

Hardware Relevant to LoRA

GIGABYTE GeForce RTX 5070 WINDFORCE OC 12G

gpu · Check Price on Amazon · 12 GB VRAM · 672 GB/s

Buy on AmazonAffiliate link — no extra cost to you
MSI GeForce RTX 5080 16G Gaming Trio OC

gpu · Check Price on Amazon · 16 GB VRAM · 960 GB/s

Buy on AmazonAffiliate link — no extra cost to you

Related Terms