Techniques
Quantization
Also known as: quantisation
In one line
Compressing a model by using lower-precision numbers, so it runs faster and fits on smaller hardware.
What does Quantization mean?
Quantizing a 70B model from 16-bit to 4-bit can shrink it 4x with modest quality loss — enough to run on a laptop or edge device.
A real-world example
Running Llama 3.1 8B in 4-bit quantised form on a MacBook.
Related terms
Large Language Model (LLM)
A neural network trained on huge text collections to predict the next word — the engine behind ChatGPT, Claude and Gemini.
Inference
The act of running a trained model to get an answer — as opposed to training it.
Open-source Model
A model whose weights are freely downloadable, so anyone can run or modify it.

