Software & Frameworks

What is RAG?

Retrieval-Augmented Generation — a technique that lets an LLM answer questions using external documents by fetching relevant chunks at query time instead of relying on training data alone.

Full Explanation

RAG (Retrieval-Augmented Generation) combines a vector database with an LLM to enable grounded, document-aware responses without fine-tuning. When a query arrives, relevant text chunks are retrieved from the vector store and injected into the prompt as context. The LLM then generates an answer based on those retrieved chunks rather than solely on its training data. Common local RAG stacks pair Ollama for inference with tools like Chroma, LanceDB, or pgvector for retrieval.

Why It Matters for Local AI

RAG is the primary use case for local AI in enterprise and productivity contexts — querying internal documents, codebases, or knowledge bases privately. Context window size matters here: a longer context window (32K+) lets you inject more retrieved chunks without truncation, improving answer quality on complex document sets.

Hardware Relevant to RAG

Apple Mac Mini (M4 Pro, 2024)

mini-pc · Check Price on Amazon · 24 GB Unified · 273 GB/s

Buy on AmazonAffiliate link — no extra cost to you
GIGABYTE GeForce RTX 5070 WINDFORCE OC 12G

gpu · Check Price on Amazon · 12 GB VRAM · 672 GB/s

Buy on AmazonAffiliate link — no extra cost to you

Related Terms