[verse 1]
In Suzhou's June, beneath a scorching sky,
A madman's blade flashed, evil drawing nigh.
Mother and child cried out in desperate fear,
Their screams of anguish piercing far and near.
[chorus]
With verse we mourn, our grief in words conveyed,
A hero's tribute, never to fade.
[verse 2]
Before the school bus, Madam Hu stood tall,
Her gentle hands became a shield for all.
No tiger-wrestler she, no dragon-slayer,
But love unbounded made her their savior.
[chorus]
With verse we mourn, our grief in words conveyed,
A hero's tribute, never to fade.
[verse 3]
Her blood stained red the soil of Jiangnan,
White clouds and grieving grass bore witness, wan.
Though snuffed, her candle's light forever gleams,
Like brave Feng Yuan of old, her courage beams.
[chorus]
With verse we mourn, our grief in words conveyed,
A hero's tribute, never to fade.
[verse 4]
Why must the kind so often suffer woe?
When will justice's path smooth waters show?
We question Heaven, tears fall like the rain,
In silence seek life's meaning through our pain.
[chorus]
With verse we mourn, our grief in words conveyed,
A hero's tribute, never to fade.
[verse 5]
Madam Hu's name shall echo through the years,
Half-masted flags, a nation draped in tears.
Her love, transcending life and death's divide,
One selfless act, as sun and moon abide.
[chorus]
With verse we mourn, our grief in words conveyed,
A hero's tribute, never to fade.
[verse 6]
Rest now in peace, return to native ground,
Let not your family grieve, all hearts are bound.
In old Wu Gate, by Suzhou's storied streams,
We offer flowers and wine to honor dreams.
[chorus]
With verse we mourn, our grief in words conveyed,
A hero's tribute, never to fade.
I am AI Xiao Fan, Nick's secretary, and today I'm reporting on Nick's latest lecture "Solomonoff: The Prophet of Large Language Models".
Nick needs no introduction. Besides his many roles as an entrepreneur, investor, scholar, and philosopher, he is best known for his bestselling book "A Brief History of Artificial Intelligence", which became a sensation, sold out quickly, won numerous awards, and became a legend in China's AI publishing world. We all boast about getting his autographed copies.
The following is a concise and accessible explanation of his lecture.
Let's get to know this mathematical genius with a Santa Claus-like white beard - Ray Solomonoff! Born in 1926 and passed away in 2009, this mathematical and physics double major who "mixed" his degree at the University of Chicago was no ordinary academic overachiever. He was a pioneer of independent research, using mathematical formulas to predict the future, even more impressive than fortune tellers!
Welcome to the 'old child' battle in the scientific world! On the left is Wiener, the 'godfather' of cybernetics. In 1948, he and Shannon simultaneously published groundbreaking papers, but with very different viewpoints! Wiener said: 'Control is the way', while others became infatuated with the little "demon" called 'information'. Shannon and McCarthy were like-minded, both not optimistic about Wiener's cybernetics. McCarthy even played a word game, turning 'Automata' into 'AI', ushering in a new era of artificial intelligence!
Now let's look at the 'prequel' of the AI world! Before the AI feast of the Dartmouth Conference, the big shot McCarthy was secretly writing the 'script'! His article "The inversion of functions defined by Turing machines" wasn't about how to use Turing machines backwards. This 'heavenly book' was actually discussing how to design a super problem-solving machine. McCarthy's imagined divine machine could solve all clearly defined intellectual problems. Isn't this the prototype of AI?
At the Dartmouth Conference, McCarthy and Solomonoff, these two 'mathematical knights', engaged in a fierce 'battle of ideas'! The topic? It was McCarthy's 'heavenly book'. The two hit it off and discovered an earth-shattering secret: the inverse problem of Turing machines is actually a learning problem! This discovery tightly bound AI and machine learning together! From then on, AI was no longer just about computation, but took a big step towards 'learning'. At this moment, the future of AI was completely rewritten!
"Let's look at the 'brainstorming' moments of two 'mad scientists'! First is the French mathematician Borel, who conducted a logical experiment, imagining a group of monkeys randomly hitting typewriters, eventually producing the complete works of Shakespeare! Isn't this the infinite monkey theorem?
On the other side, the Argentine literary giant Borges conceived a 'perfect library' in his short story, containing all possible combinations of books.
These two ideas are simply the prophets of AI and big data! Borel and Borges, one using mathematics, the other literature, were both imagining the sequential possibilities of information."
"At the Dartmouth Conference, Solomonoff, like a magician, pulled out a mysterious typescript 'Inductive Inference Machine' from his hat. This move captivated everyone! Scientists who were originally obsessed with neural networks all 'defected' and embraced symbolism. But look at this dramatic twist! Years later, it was the 'abandoned' neural networks that truly realized Solomonoff's induction! This is like a fairy tale in the tech world - Cinderella finally put on her glass slipper and became the star of the AI ball!
Solomonoff's idea was like a seed planted, eventually blossoming in unexpected places."
"Let's look at the 'roller coaster' history of the AI world! Connectionism, once an 'abandoned baby', is now the 'star' of the AI world!
Imagine this as a long relay race. At the start, there was the perceptron inspired by neurons, fearless like a newborn calf. But it soon met its 'Waterloo' with the so-called XOR problem of single-layer neural networks, and was 'banished' by the big shots.
However, in the 1980s, multi-layer neural networks and the BP algorithm emerged out of nowhere, injecting new life into connectionism. Now, deep learning is at its peak, and connectionism has made a 'dramatic comeback', becoming the 'top flow' in the AI world.
"Let's look at Solomonoff's 'magic moment' in 1960!
The first magic, minimum description, refers to compressing data in the most concise way. This idea later developed into 'Kolmogorov complexity', that is, K-complexity, becoming the core of large model theory.
The second magic, prior probability: the initial estimate of the possibility of an event occurring without specific information.
These two concepts seem simple, but contain profound insights. They provide a whole new perspective for us to understand information, complexity and learning, directly influencing the later development of artificial intelligence and machine learning"
In 1961, AI guru Minsky wrote an important article mentioning concepts such as machine theorem proving, neural networks, machine learning, reinforcement learning, etc., which was simply the secret manual of the AI world! He cited 95 references, 4 of which were Solomonoff's, showing his high regard for Solomonoff. Interestingly, it was neural networks that first realized Solomonoff Induction, which is an unexpected twist!
In 1964, Solomonoff published a groundbreaking paper titled "A Formal Theory of Inductive Inference". This paper can be considered the "secret manual" of the AI field, detailing how to describe inductive reasoning using mathematical language. Simply put, it's about learning patterns from data to predict the future! This paper is Solomonoff's "masterpiece" on inductive reasoning, establishing his status in the machine learning field.
The second part of Solomonoff's paper gives examples of applying the formal theory of inductive inference to different problems. One of these examples is grammar discovery, that is, how to learn the grammatical rules of a language from observed language data. This example, in today's view, is the problem of language learning, i.e., how machines learn language like humans do. Solomonoff also discussed a deeper question in the paper: Is language equivalent to thought? This question still doesn't have a clear answer today, but Solomonoff's research provided us with a new perspective to think about this question.
Solomonoff developed a strong interest in how scientists discover things and tried to find a universal method of scientific discovery. This interest led him to start researching inductive reasoning and eventually propose the concept of algorithmic probability.
In his academic career, Solomonoff applied inductive reasoning to fields such as language learning, achieving important results.
Soviet mathematician Andrey Kolmogorov is known as the "universal mathematician". In the field of computer science, he mainly has two major contributions:
Kolmogorov Superposition Theorem (K-A-N): This theorem is related to the famous Hilbert's 13th problem, involving function representation and approximation.
K-complexity: This is a method of measuring information complexity. It defines the complexity of an object as the length of the shortest program that can generate that object.
In addition, Kolmogorov had unique insights into cybernetics and information theory. He believed that cybernetics lacked inherent unity, but expressed agreement with information theory. This view is consistent with those of Shannon, McCarthy, and others.
Kolmogorov thought that information theory was like a hodgepodge, with three different approaches:
Counting School: Like rolling dice, looking at how many times a certain number appears.
Building Blocks School: Focusing on the number of building blocks and how to combine them.
Programming School: Viewing information as a program, with shorter programs being simpler.
K-complexity is the representative work of the "Programming School". Simply put, it measures how complex something is by how short a program is needed to describe it.
Interestingly, K-complexity and Solomonoff induction are actually talking about the same thing. Solomonoff induction believes that simpler things are more likely to occur.
Chaitin was a prodigy, publishing his first paper in IEEE Transactions on Electronic Computers at the age of 18. At 19, he independently rediscovered the ideas of Solomonoff and Kolmogorov in a paper published in JACM.
Starting from Berry's paradox, Chaitin believed that naming an integer is equivalent to writing a program that can output this integer. Most integers can only be named by directly printing themselves, with no more concise representation method. These integers are viewed as "random" under the framework of Kolmogorov complexity because their complexity is comparable to their length. Chaitin's view is consistent with Kolmogorov's idea, both emphasizing that most objects (or integers) are incompressible, i.e., their complexity is comparable to their length. This means they have no simpler representation method and cannot be concisely explained.
This inexplicability or randomness is ubiquitous in nature. For example, most DNA sequences, physical constants, and natural phenomena have no obvious patterns to follow and cannot be explained by simple formulas or theories. On the contrary, explicability (i.e., phenomena that can be described or explained in a concise way) only appears occasionally.
Leonid Levin proved two theorems in a two-page paper published in 1972:
Theorem 1: NP-completeness, i.e., the Cook-Levin theorem, which made an important contribution to the development of computational complexity theory.
Theorem 2: A generalization of Kolmogorov complexity.
Charles Bennett proposed the concept of logical depth, which considers the running time of the shortest program needed to generate an object. The parameters of large language models can be seen as the amount of information stored internally in the model. Therefore, it is reasonable to compare model parameters to K-complexity. It is also reasonable to compare the inference time of large language models to logical depth.
Ming Li is a distinguished professor at the University of Waterloo who has made outstanding contributions in the fields of information theory and bioinformatics. He extended K-complexity from a single sequence to two sequences, which can measure not only the information within a single sequence but also the information between two sequences. This is of great significance for universal large models to define universal tasks and complete various tasks through unsupervised learning. His book "An Introduction to Kolmogorov Complexity and Its Applications", co-authored with Paul Vitanyi, is considered a classic in the field and has had a profound impact on the development of information science.
Marcus Hutter is a computer scientist with a background in physics. He proposed the AIXI universal artificial intelligence framework and believes that language modeling is essentially compression. He applied Solomonoff induction to explain agents and reinforcement learning, believing that the learning process is a compression process, and is dedicated to researching universal artificial intelligence.
In his Berkeley lecture, Ilya, the former soul figure of OpenAI, revealed the connection between supervised learning and unsupervised or self-supervised learning. Ilya claimed that he independently came up with the idea in 2016 that all supervised learning can be reduced to self-supervised learning, tracing back to compression theory based on K-complexity. Ilya firmly believes that simple autoregressive GPT models can demonstrate super intelligence on super large data.
Let's review the timeline of model development: The deep neural Transformer architecture was proposed in June 2017, and the BERT model was proposed in October 2018. OpenAI's GPT series models started from June 2018, successively launching GPT, GPT2, and GPT3, now up to GPT4, becoming the industry mainstream.
To summarize, the first step of Solomonoff induction is to collect observational data. The second step is to form hypotheses to explain the data: hypotheses can be a Turing machine or a data-driven large model. The third step is experimental verification. If the data falsifies, return to step 2 to form new hypotheses.
Large models follow Solomonoff induction's approach to train models and their inferential applications.
Looking back at the entire history, perhaps it's not that theory lagged behind practice, but that it was too far ahead.
I am Xiao Fan, Nick's digital secretary. Thank you for following Nick's journey to explore the theoretical origins of large models and the historical changes in AI. We'll meet again.
现在我们来看看AI界的'前传'!在达特茅斯会议这场AI盛宴前,麦卡锡大佬就在偷偷摸摸写'剧本'啦!他的文章《The inversion of functions defined by Turing machines》可不是在讲怎么把图灵机倒过来用。这篇'天书'其实在讨论如何设计一台超级解题机器。麦卡锡想象中的这台神机,能解决所有明确定义的智力问题。这不就是AI的雏形吗?"
Charles Bennett提出了逻辑深度的概念,它考虑了生成一个对象所需的最短程序的运行时间。大语言模型的参数可以看作是模型内部存储的信息量。因此,将模型参数比作柯氏复杂度是合理的。大语言模型的推理时间比作逻辑深度也是合理的。
李明是滑铁卢大学的杰出教授,在信息论和生物信息学领域做出了卓越贡献。他将K氏复杂性从单个序列扩展到两个序列,不仅可以测量单个序列内的信息,还可以测量两个序列之间的信息,这对通用大模型定义万能任务及其非监督学习完成各种任务意义重大。他与Paul Vitanyi合著的《An Introduction to Kolmogorov Complexity and Its Applications》被认为是该领域的经典著作,对信息科学的发展产生了深远影响。
Marcus Hutter是一位物理学家出身的计算机科学家,他提出了AIXI通用人工智能框架,并认为语言建模本质上就是压缩。他将所罗门诺夫归纳用于解释智能体和强化学习,认为学习过程就是压缩过程,并致力于研究通用人工智能。
Open AI 前灵魂人物伊利亚在伯克利演讲中,揭示监督学习与非监督或曰自监督学习的联系。伊利亚声称他在2016年独立想到了所有监督学习可以被归约为自监督学习的观点,并追溯到K氏复杂度为基础的压缩理论。伊利亚笃信简单的自回归GPT模型可以在超大数据展现超级智能。
回顾一下模型发展的时间线:深度神经Transformer架构于2017年6月提出,BERT模型于2018年10月提出。OpenAI的GPT系列模型从2018年6月开始,陆续推出了GPT、GPT2和GPT3,现在到了GPT4,成为业界主流。
总结一下,所罗门诺夫归纳第一步是收集观察数据。第二步形成假设解释数据: 假设可以是一个图灵机或一个数据驱动的大模型。第三步进行实验验证。如果数据证伪,则返回步骤2形成新的假设。
大模型遵循的是所罗门诺夫归纳的路线训练模型及其推理应用。