Question 1

What is INT4?

Accepted Answer

INT4 (4-bit integer) quantization represents each model weight as a 4-bit integer instead of a 16-bit float, reducing memory footprint by roughly 75%. A 7B model that requires ~14 GB at FP16 precision fits in ~4 GB at INT4. Quality degrades measurably compared to FP16 — especially on math, code, and precise factual recall — but remains acceptable for conversational and summarization tasks. INT4 is the default quantization level in most consumer inference setups, corresponding to Q4_K_M in GGUF or 4-bit in AWQ/EXL2.

Question 2

Why does INT4 matter for local AI?

Accepted Answer

INT4 is the reason you can run a 13B model on a 12 GB GPU or a 70B model on 48 GB unified memory. It's the practical enabler of local AI on consumer hardware. For most chat and productivity use cases, the quality difference from FP16 is undetectable. For code generation or precise math, consider INT8 or FP16 on hardware that can fit it.

What is INT4?

Full Explanation

Why It Matters for Local AI

Hardware Relevant to INT4

Related Terms