6.99 10/15 bnQ:/ [email protected] 6月26日-English https://v.douyin.com/i6MbUvKH/ 复制此链接,打开Dou音搜索,直接观看视频!
Click this link: https://v.douyin.com/i6MbUvKH/
I am AI Xiao Fan, Nick's secretary, and today I'm reporting on Nick's latest lecture "Solomonoff: The Prophet of Large Language Models".
Nick needs no introduction. Besides his many roles as an entrepreneur, investor, scholar, and philosopher, he is best known for his bestselling book "A Brief History of Artificial Intelligence", which became a sensation, sold out quickly, won numerous awards, and became a legend in China's AI publishing world. We all boast about getting his autographed copies.
The following is a concise and accessible explanation of his lecture.
Let's get to know this mathematical genius with a Santa Claus-like white beard - Ray Solomonoff! Born in 1926 and passed away in 2009, this mathematical and physics double major who "mixed" his degree at the University of Chicago was no ordinary academic overachiever. He was a pioneer of independent research, using mathematical formulas to predict the future, even more impressive than fortune tellers!
Welcome to the 'old child' battle in the scientific world! On the left is Wiener, the 'godfather' of cybernetics. In 1948, he and Shannon simultaneously published groundbreaking papers, but with very different viewpoints! Wiener said: 'Control is the way', while others became infatuated with the little "demon" called 'information'. Shannon and McCarthy were like-minded, both not optimistic about Wiener's cybernetics. McCarthy even played a word game, turning 'Automata' into 'AI', ushering in a new era of artificial intelligence!
Now let's look at the 'prequel' of the AI world! Before the AI feast of the Dartmouth Conference, the big shot McCarthy was secretly writing the 'script'! His article "The inversion of functions defined by Turing machines" wasn't about how to use Turing machines backwards. This 'heavenly book' was actually discussing how to design a super problem-solving machine. McCarthy's imagined divine machine could solve all clearly defined intellectual problems. Isn't this the prototype of AI?
At the Dartmouth Conference, McCarthy and Solomonoff, these two 'mathematical knights', engaged in a fierce 'battle of ideas'! The topic? It was McCarthy's 'heavenly book'. The two hit it off and discovered an earth-shattering secret: the inverse problem of Turing machines is actually a learning problem! This discovery tightly bound AI and machine learning together! From then on, AI was no longer just about computation, but took a big step towards 'learning'. At this moment, the future of AI was completely rewritten!
"Let's look at the 'brainstorming' moments of two 'mad scientists'! First is the French mathematician Borel, who conducted a logical experiment, imagining a group of monkeys randomly hitting typewriters, eventually producing the complete works of Shakespeare! Isn't this the infinite monkey theorem?
On the other side, the Argentine literary giant Borges conceived a 'perfect library' in his short story, containing all possible combinations of books.
These two ideas are simply the prophets of AI and big data! Borel and Borges, one using mathematics, the other literature, were both imagining the sequential possibilities of information."
"At the Dartmouth Conference, Solomonoff, like a magician, pulled out a mysterious typescript 'Inductive Inference Machine' from his hat. This move captivated everyone! Scientists who were originally obsessed with neural networks all 'defected' and embraced symbolism. But look at this dramatic twist! Years later, it was the 'abandoned' neural networks that truly realized Solomonoff's induction! This is like a fairy tale in the tech world - Cinderella finally put on her glass slipper and became the star of the AI ball!
Solomonoff's idea was like a seed planted, eventually blossoming in unexpected places."
"Let's look at the 'roller coaster' history of the AI world! Connectionism, once an 'abandoned baby', is now the 'star' of the AI world!
Imagine this as a long relay race. At the start, there was the perceptron inspired by neurons, fearless like a newborn calf. But it soon met its 'Waterloo' with the so-called XOR problem of single-layer neural networks, and was 'banished' by the big shots.
However, in the 1980s, multi-layer neural networks and the BP algorithm emerged out of nowhere, injecting new life into connectionism. Now, deep learning is at its peak, and connectionism has made a 'dramatic comeback', becoming the 'top flow' in the AI world.
"Let's look at Solomonoff's 'magic moment' in 1960!
The first magic, minimum description, refers to compressing data in the most concise way. This idea later developed into 'Kolmogorov complexity', that is, K-complexity, becoming the core of large model theory.
The second magic, prior probability: the initial estimate of the possibility of an event occurring without specific information.
These two concepts seem simple, but contain profound insights. They provide a whole new perspective for us to understand information, complexity and learning, directly influencing the later development of artificial intelligence and machine learning"
In 1961, AI guru Minsky wrote an important article mentioning concepts such as machine theorem proving, neural networks, machine learning, reinforcement learning, etc., which was simply the secret manual of the AI world! He cited 95 references, 4 of which were Solomonoff's, showing his high regard for Solomonoff. Interestingly, it was neural networks that first realized Solomonoff Induction, which is an unexpected twist!
In 1964, Solomonoff published a groundbreaking paper titled "A Formal Theory of Inductive Inference". This paper can be considered the "secret manual" of the AI field, detailing how to describe inductive reasoning using mathematical language. Simply put, it's about learning patterns from data to predict the future! This paper is Solomonoff's "masterpiece" on inductive reasoning, establishing his status in the machine learning field.
The second part of Solomonoff's paper gives examples of applying the formal theory of inductive inference to different problems. One of these examples is grammar discovery, that is, how to learn the grammatical rules of a language from observed language data. This example, in today's view, is the problem of language learning, i.e., how machines learn language like humans do. Solomonoff also discussed a deeper question in the paper: Is language equivalent to thought? This question still doesn't have a clear answer today, but Solomonoff's research provided us with a new perspective to think about this question.
Solomonoff developed a strong interest in how scientists discover things and tried to find a universal method of scientific discovery. This interest led him to start researching inductive reasoning and eventually propose the concept of algorithmic probability.
In his academic career, Solomonoff applied inductive reasoning to fields such as language learning, achieving important results.
Soviet mathematician Andrey Kolmogorov is known as the "universal mathematician". In the field of computer science, he mainly has two major contributions:
Kolmogorov Superposition Theorem (K-A-N): This theorem is related to the famous Hilbert's 13th problem, involving function representation and approximation.
K-complexity: This is a method of measuring information complexity. It defines the complexity of an object as the length of the shortest program that can generate that object.
In addition, Kolmogorov had unique insights into cybernetics and information theory. He believed that cybernetics lacked inherent unity, but expressed agreement with information theory. This view is consistent with those of Shannon, McCarthy, and others.
Kolmogorov thought that information theory was like a hodgepodge, with three different approaches:
Counting School: Like rolling dice, looking at how many times a certain number appears.
Building Blocks School: Focusing on the number of building blocks and how to combine them.
Programming School: Viewing information as a program, with shorter programs being simpler.
K-complexity is the representative work of the "Programming School". Simply put, it measures how complex something is by how short a program is needed to describe it.
Interestingly, K-complexity and Solomonoff induction are actually talking about the same thing. Solomonoff induction believes that simpler things are more likely to occur.
Chaitin was a prodigy, publishing his first paper in IEEE Transactions on Electronic Computers at the age of 18. At 19, he independently rediscovered the ideas of Solomonoff and Kolmogorov in a paper published in JACM.
Starting from Berry's paradox, Chaitin believed that naming an integer is equivalent to writing a program that can output this integer. Most integers can only be named by directly printing themselves, with no more concise representation method. These integers are viewed as "random" under the framework of Kolmogorov complexity because their complexity is comparable to their length. Chaitin's view is consistent with Kolmogorov's idea, both emphasizing that most objects (or integers) are incompressible, i.e., their complexity is comparable to their length. This means they have no simpler representation method and cannot be concisely explained.
This inexplicability or randomness is ubiquitous in nature. For example, most DNA sequences, physical constants, and natural phenomena have no obvious patterns to follow and cannot be explained by simple formulas or theories. On the contrary, explicability (i.e., phenomena that can be described or explained in a concise way) only appears occasionally.
Leonid Levin proved two theorems in a two-page paper published in 1972:
Theorem 1: NP-completeness, i.e., the Cook-Levin theorem, which made an important contribution to the development of computational complexity theory.
Theorem 2: A generalization of Kolmogorov complexity.
Charles Bennett proposed the concept of logical depth, which considers the running time of the shortest program needed to generate an object. The parameters of large language models can be seen as the amount of information stored internally in the model. Therefore, it is reasonable to compare model parameters to K-complexity. It is also reasonable to compare the inference time of large language models to logical depth.
Ming Li is a distinguished professor at the University of Waterloo who has made outstanding contributions in the fields of information theory and bioinformatics. He extended K-complexity from a single sequence to two sequences, which can measure not only the information within a single sequence but also the information between two sequences. This is of great significance for universal large models to define universal tasks and complete various tasks through unsupervised learning. His book "An Introduction to Kolmogorov Complexity and Its Applications", co-authored with Paul Vitanyi, is considered a classic in the field and has had a profound impact on the development of information science.
Marcus Hutter is a computer scientist with a background in physics. He proposed the AIXI universal artificial intelligence framework and believes that language modeling is essentially compression. He applied Solomonoff induction to explain agents and reinforcement learning, believing that the learning process is a compression process, and is dedicated to researching universal artificial intelligence.
In his Berkeley lecture, Ilya, the former soul figure of OpenAI, revealed the connection between supervised learning and unsupervised or self-supervised learning. Ilya claimed that he independently came up with the idea in 2016 that all supervised learning can be reduced to self-supervised learning, tracing back to compression theory based on K-complexity. Ilya firmly believes that simple autoregressive GPT models can demonstrate super intelligence on super large data.
Let's review the timeline of model development: The deep neural Transformer architecture was proposed in June 2017, and the BERT model was proposed in October 2018. OpenAI's GPT series models started from June 2018, successively launching GPT, GPT2, and GPT3, now up to GPT4, becoming the industry mainstream.
To summarize, the first step of Solomonoff induction is to collect observational data. The second step is to form hypotheses to explain the data: hypotheses can be a Turing machine or a data-driven large model. The third step is experimental verification. If the data falsifies, return to step 2 to form new hypotheses.
Large models follow Solomonoff induction's approach to train models and their inferential applications.
Looking back at the entire history, perhaps it's not that theory lagged behind practice, but that it was too far ahead.
I am Xiao Fan, Nick's digital secretary. Thank you for following Nick's journey to explore the theoretical origins of large models and the historical changes in AI. We'll meet again.
【立委NLP频道】