GPT as a Cosmic Librarian: Unlocking Lossless Compression

Imagine a cosmic library holding every possible sentence, from the profound “Artificial intelligence will reshape the future” to the absurd “Cat pillow jumps blue.” Popular sentences sit on prominent shelves, easily found with a short note like “Shelf 3, Book 5.” Random gibberish hides in dusty basements, requiring a long, word-for-word map. GPT, our cosmic librarian, navigates this library with uncanny precision, compressing texts into compact codes that can be perfectly restored. How does it work, and why is this a game-changer for data compression?

The Library of Language

In this infinite library, each sentence has a “popularity” score—its probability based on grammar, meaning, and context. GPT’s world model, trained on vast texts, assigns high probabilities to meaningful sentences, making them easier to locate. For example, “Artificial intelligence will reshape the future” is a bestseller, while “Cat pillow jumps blue” is obscure. Compression is about encoding these locations efficiently. How might GPT’s understanding of language make this possible?

Arithmetic Coding: The Magic Wand

GPT teams up with arithmetic coding to turn sentences into numbers. Here’s how it compresses “Artificial intelligence will reshape…” (tokenized as “Artificial,” “intelligence,” “will,” …):

Start with [0.0, 1.0]: The entire number line as space that represents all possible sequences.
Encode “Artificial”: GPT predicts a 5% chance (P=0.05) for this word to be the first token in a sentence, shrinking the interval to [0.0, 0.05].
Encode “intelligence”: Given “Artificial,” GPT predicts an 80% chance (P=0.8), narrowing to [0.0, 0.04].
Continue: Each token shrinks the interval further, ending with a tiny range, say [0.02113, 0.02114].
Output: Convert a number like 0.02113 to binary (e.g., 0.00010101), which is the compressed result of the processed sentence.

Decompression reverses this, using the same GPT model to retrace the intervals and reconstruct the exact text. Why does this ensure no data is lost?

Information Theory: Why Predictability Saves Space

Information theory reveals why this works. A word’s information content is -log₂(P(x)). High-probability words carry little information, rare words carry more. Predictable sentences, rich in semantic patterns, form larger intervals in the line, requiring fewer bits. Why might random text, like white noise, resist compression?

Why It Matters

This approach could revolutionize data storage and transmission, from archiving logs to sending messages across galaxies. But what challenges might arise in real-world applications? How could GPT’s predictive power evolve with better models?

Original post: https://liweinlp.com/13277

发布者

立委

立委博士，多模态大模型应用咨询师。出门问问大模型团队前工程副总裁，聚焦大模型及其AIGC应用。Netbase前首席科学家10年，期间指挥研发了18种语言的理解和应用系统，鲁棒、线速，scale up to 社会媒体大数据，语义落地到舆情挖掘产品，成为美国NLP工业落地的领跑者。Cymfony前研发副总八年，曾荣获第一届问答系统第一名（TREC-8 QA Track），并赢得17个小企业创新研究的信息抽取项目（PI for 17 SBIRs）。查看立委的所有文章

The Library of Language

Arithmetic Coding: The Magic Wand

Information Theory: Why Predictability Saves Space

Why It Matters

发布者

立委

发表回复