Question 1

What is Context Window?

Accepted Answer

A model's context window is the total number of tokens it processes in a single forward pass — both your input and its output combined. Llama 3.1 supports up to 128K tokens (~96,000 words). However, every additional token in the context increases VRAM consumption. Running a 128K context requires significantly more memory than the base model weight alone, often doubling VRAM usage. Most local AI setups cap context at 4K–8K tokens to keep memory consumption manageable.

Question 2

Why does Context Window matter for local AI?

Accepted Answer

For coding assistants working on large codebases, a 32K+ context window is transformative. For simple Q&A, 4K is sufficient. Check whether your hardware supports your target context size — a 13B model at 32K context may require 24+ GB of VRAM, pushing it beyond 12 GB cards.

What is Context Window?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to Context Window

Related Terms