Question 1

What is RAG?

Accepted Answer

RAG (Retrieval-Augmented Generation) combines a vector database with an LLM to enable grounded, document-aware responses without fine-tuning. When a query arrives, relevant text chunks are retrieved from the vector store and injected into the prompt as context. The LLM then generates an answer based on those retrieved chunks rather than solely on its training data. Common local RAG stacks pair Ollama for inference with tools like Chroma, LanceDB, or pgvector for retrieval.

Question 2

Why does RAG matter for local AI?

Accepted Answer

RAG is the primary use case for local AI in enterprise and productivity contexts — querying internal documents, codebases, or knowledge bases privately. Context window size matters here: a longer context window (32K+) lets you inject more retrieved chunks without truncation, improving answer quality on complex document sets.

What is RAG?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to RAG

Related Terms