Arithmetic Coding for GPT’s Compression Engine

At the heart of GPT’s compression lies arithmetic coding, a method that turns text into numbers with surgical precision. Like a GPS encoding a house’s location, it captures sentences in compact codes. How does this engine work, and why is it so effective?

The Mechanics

GPT predicts probabilities for each token (e.g., P(“future” | “Artificial intelligence is”)=0.6), and arithmetic coding divides [0, 1) into subintervals:

Start with [0, 1).
Assign [0, 0.6) to “future,” narrowing the range.
Iterate for each token, ending with a tiny interval (e.g., [0.3654321, 0.3654343]).
Output a binary number as the compressed code.

Decompression uses the same GPT model to reverse the process, ensuring bit-level accuracy. Why is the same model critical?

A GPS Analogy

Compression is like encoding a villa’s address into a postal code. Decompression follows this code to the exact spot. This precision ensures no loss. How does this analogy clarify the process?

The Edge of Efficiency

GPT’s accurate predictions make intervals larger for predictable text, reducing bits needed. What limits this approach, and how might better models enhance it?

Original post: https://liweinlp.com/13273

发布者

立委

立委博士，多模态大模型应用高级咨询。出门问问大模型团队前工程副总裁，聚焦大模型及其AIGC应用。Netbase前首席科学家10年，期间指挥研发了18种语言的理解和应用系统，鲁棒、线速，scale up to 社会媒体大数据，语义落地到舆情挖掘产品，成为美国NLP工业落地的领跑者。Cymfony前研发副总八年，曾荣获第一届问答系统第一名（TREC-8 QA Track），并赢得17个小企业创新研究的信息抽取项目（PI for 17 SBIRs）。查看立委的所有文章

The Mechanics

A GPS Analogy

The Edge of Efficiency

发布者

立委

发表回复