At the heart of GPT’s compression lies arithmetic coding, a method that turns text into numbers with surgical precision. Like a GPS encoding a house’s location, it captures sentences in compact codes. How does this engine work, and why is it so effective?
The Mechanics
GPT predicts probabilities for each token (e.g., P(“future” | “Artificial intelligence is”)=0.6), and arithmetic coding divides [0, 1) into subintervals:
-
Start with [0, 1).
-
Assign [0, 0.6) to “future,” narrowing the range.
-
Iterate for each token, ending with a tiny interval (e.g., [0.3654321, 0.3654343]).
-
Output a binary number as the compressed code.
Decompression uses the same GPT model to reverse the process, ensuring bit-level accuracy. Why is the same model critical?
A GPS Analogy
Compression is like encoding a villa’s address into a postal code. Decompression follows this code to the exact spot. This precision ensures no loss. How does this analogy clarify the process?
The Edge of Efficiency
GPT’s accurate predictions make intervals larger for predictable text, reducing bits needed. What limits this approach, and how might better models enhance it?
Original post: https://liweinlp.com/13273