In 1948, inspired by psychiatric patients, British doctor Ross Ashby invented a peculiar machine called the "Homeostat." He proclaimed that this device, costing about 50 pounds, was "the closest thing to an artificial brain ever designed by mankind." The Homeostat utilized four bomb control switch gear devices from the British Royal Air Force, used during World War II, as its base. Above these were four cubic aluminum boxes, with the only visible moving parts being four small magnetic needles on top of the boxes, swaying like compass needles in a small trough of water.
When the machine was activated, the needles moved in response to the electric current from the aluminum boxes. The four magnetic needles were always in a sensitive and fragile state of balance. The sole purpose of the Homeostat was to keep the needles centered, maintaining a "comfortable" state for the machine.
Ashby experimented with various methods to make the machine "uncomfortable," such as reversing the polarity of the electrical connections or the direction of the needles. However, the machine always found ways to adapt to the new state and re-center the needles. Ashby described the machine as "actively" resisting any disturbances to its balance through synaptic action, performing "coordinated activities" to regain equilibrium.
Ashby believed that one day, such a "primitive device" could evolve into an artificial brain more powerful than any human, capable of solving the world's most complex and challenging problems.
Despite Ashby's lack of knowledge about today's AGI evolution and the laughable idea of using four small magnetic needles as sensors for intelligence, his Homeostat fundamentally challenged everyone's understanding of "intelligence" - isn't intelligence the ability to absorb information from the environment in various modalities, and to modify behavior and responses based on feedback?
From the peculiar "Homeostat" to today, 75 years later, Google's Gemini, which claims to have surpassed human multi-modal task processing abilities, accelerates towards the evolution of billions of years of carbon-based intelligence through the injection of multi-modal native big data.
The acceleration speed of machine intelligence evolution today far exceeds our imagination. A year ago, OpenAI overturned Google's long-established AI position with its 'brute force aesthetic,' having constructed the Babel Tower of human languages. A year later, Google countered with Gemini, via a 'fight fire with fire' approach to building the first unified cross-modal model, setting another milestone in AGI evolution.
Despite initial skepticism over exaggerated video demos upon Gemini's release, it's undeniable that the dawn of a unified multi-modal approach is shining. What capabilities does Gemini confirm? How will Google's wheels of fate turn? Is time a friend to OpenAI or Google? What does multi-modality mean for Agents and embodied intelligence? Are the foundations for the emergence of AGI with consciousness already in place? How should we view the implications of Gemini for the AI future?
01.
Cross-modal Knowledge Transfer of Large Models Proven Again
(Gemini Notes Series to be continued)