Techniques

Quantization

Also known as: quantisation

In one line

Compressing a model by using lower-precision numbers, so it runs faster and fits on smaller hardware.

What does Quantization mean?

Quantizing a 70B model from 16-bit to 4-bit can shrink it 4x with modest quality loss — enough to run on a laptop or edge device.

A real-world example

Running Llama 3.1 8B in 4-bit quantised form on a MacBook.

Related terms