Question 1

What is LoRA?

Accepted Answer

LoRA (Low-Rank Adaptation) inserts small trainable matrices into a frozen base model's attention layers, updating only 0.1–1% of total parameters during training. This makes fine-tuning accessible on consumer hardware: a 7B model fine-tune with LoRA requires roughly 10–14 GB VRAM versus 80+ GB for full fine-tuning. The resulting adapter file is typically 50–500 MB and is applied on top of the base model at inference time using tools like Ollama, llama.cpp, or LM Studio.

Question 2

Why does LoRA matter for local AI?

Accepted Answer

LoRA is how most hobbyists customize models — teaching a 7B base model a specific writing style, domain vocabulary, or task behavior. An RTX 5070 with 12 GB VRAM is the minimum comfortable GPU for LoRA training on 7B models. The RTX 5080 with 16 GB handles 13B comfortably.

What is LoRA?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to LoRA

Related Terms