For more details see my latest book
|
16th May 2025
Why the deep learning approach to artificial intelligence will not achieve human level general intelligence, and how could general intelligence be designed?
Why the deep learning approach to artificial intelligence will not achieve human level general intelligence, and how could general intelligence be designed?
Current artificial intelligence systems depend on what is called deep learning: the organization of huge numbers of simple devices in layers with information propagating through the layers.
Some of these systems can achieve or exceed human capabilities in one specific area. Such systems can play games like Chess or Go and defeat the most skilled humans. Other such systems can outperform humans in specific pattern recognition tasks, from diagnosing disease in radiology images to reading badly damaged ancient papyrus writing. Yet other systems such as ChatGT designed by OpenAI can hold conversations with a human in which it is hard for the human to distinguish ChatGT from another human. Why then can we not expect that in due course a single deep learning intelligence will exceed human intelligence in every way? There are a number of reasons why the deep learning approach is unlikely to lead to such a general intelligence.
The first problem is the resources required
A transistor can be regarded as a gate that opens to allow electrical current to pass. OpenAI uses around 720,000 Nvidia H100 GPU microcircuits for its operations. Each of these microcircuits has around 80 billion transistors. This adds up to a total transistor count of around 5 x 10exp16.
In the human brain there are around 86 billion neurons. On average each neuron is connected to about 10 thousand other neurons. The point of connection between two neurons is called a synapse. A synapse contains ion channels which can be regarded as gates that open to allow current to pass from one neuron to another, so an ion channel is very roughly equivalent to a transistor in information processing capability. A critical ion channel is the AMPA channel of which there are about 100 in the average synapse. So there are about 1014 AMPA channels in the human brain. There are other types of ion channel, but even if they added up to several times the number of AMPA channels - a somewhat unlikely scenario - this would still mean that the human brain has "only" 5 x 10exp14 current gates.
Thus even with this conservative assumption, OpenAI processing requires 100 times as many basic information processing units as the human brain. The difference in capability must lie in the way these information processing units are organized.
Training OpenAI consumes around 1,300 megawatt hours and then about 2.9 watt-hours per query. On average a human can respond to a verbal query in 10 seconds or less. Although a human is not constantly responding to verbal queries, the brain is constantly processing input to determine the most appropriate behaviours. These behaviours include selecting the next part of the visual field to look at, selecting the next body movement, and selecting the next step in some mental process. Such behaviours are selected roughly once every 200 milliseconds. Groups of these behaviours are probably equivalent in processing terms to generating a verbal response. Hence in a human lifetime the brain responds to the equivalent of 300 million ChatGPT queries. ChatGPT would consume about 900 megawatt hours to generate responses, so the total energy for learning and responding is about 2000 megawatt hours.
A human body at rest requires 100 watts of power, of which about 10 watts supports the brain. Over a lifetime of 100 years, the brain thus consumes about 107 watt-hours, or 100 megawatt hours. So the energy resources required by the current OpenAI are a factor of 20 higher than required by the human brain for equivalent processing. And this is the case for OpenAI capabilities that fall far short of the general intelligence capabilities of the brain.
The conclusion is that even for the limited AI capabilities of current deep learning based systems, the physical and energy resources far exceed those utilized by the human brain. The difference must lie in the way brain resources are organized.
The second problem is the interference between new and prior learning
Interference between new and prior learning is widespread in deep learning networks. When a deep learning system has fully learned one type of task, if it is then trained to perform a second type of task the ability to perform the first task is reduced or even lost. Furthermore, even later learning related to the same task can interfere with earlier learning. For example, successful learning of a computer game can be disrupted by later learning of the same game. This type interference severely limits the ability to implement a general intelligence able to learn many different types of task. Such interference is much more limited in the case of the human brain.
The key issue is the architectural difference.
Deep learning systems have a very simple conceptual architecture made up of successive layers of perceptrons, each layer connected to the next layer. This is in sharp contrast with biological brains, in which a number of separate subsystems are observed, with obvious differences in device type and connectivity. The major subsystems of this architecture include the cortex, hippocampus, thalamus, basal forebrain, basal ganglia, amygdala, hypothalamus and cerebellum. It is striking that the same subsystems of this architecture are observed across all animal species, including mammals, birds, reptiles and even cephalopods like the octopus and squid.
Some of these systems can achieve or exceed human capabilities in one specific area. Such systems can play games like Chess or Go and defeat the most skilled humans. Other such systems can outperform humans in specific pattern recognition tasks, from diagnosing disease in radiology images to reading badly damaged ancient papyrus writing. Yet other systems such as ChatGT designed by OpenAI can hold conversations with a human in which it is hard for the human to distinguish ChatGT from another human. Why then can we not expect that in due course a single deep learning intelligence will exceed human intelligence in every way? There are a number of reasons why the deep learning approach is unlikely to lead to such a general intelligence.
The first problem is the resources required
A transistor can be regarded as a gate that opens to allow electrical current to pass. OpenAI uses around 720,000 Nvidia H100 GPU microcircuits for its operations. Each of these microcircuits has around 80 billion transistors. This adds up to a total transistor count of around 5 x 10exp16.
In the human brain there are around 86 billion neurons. On average each neuron is connected to about 10 thousand other neurons. The point of connection between two neurons is called a synapse. A synapse contains ion channels which can be regarded as gates that open to allow current to pass from one neuron to another, so an ion channel is very roughly equivalent to a transistor in information processing capability. A critical ion channel is the AMPA channel of which there are about 100 in the average synapse. So there are about 1014 AMPA channels in the human brain. There are other types of ion channel, but even if they added up to several times the number of AMPA channels - a somewhat unlikely scenario - this would still mean that the human brain has "only" 5 x 10exp14 current gates.
Thus even with this conservative assumption, OpenAI processing requires 100 times as many basic information processing units as the human brain. The difference in capability must lie in the way these information processing units are organized.
Training OpenAI consumes around 1,300 megawatt hours and then about 2.9 watt-hours per query. On average a human can respond to a verbal query in 10 seconds or less. Although a human is not constantly responding to verbal queries, the brain is constantly processing input to determine the most appropriate behaviours. These behaviours include selecting the next part of the visual field to look at, selecting the next body movement, and selecting the next step in some mental process. Such behaviours are selected roughly once every 200 milliseconds. Groups of these behaviours are probably equivalent in processing terms to generating a verbal response. Hence in a human lifetime the brain responds to the equivalent of 300 million ChatGPT queries. ChatGPT would consume about 900 megawatt hours to generate responses, so the total energy for learning and responding is about 2000 megawatt hours.
A human body at rest requires 100 watts of power, of which about 10 watts supports the brain. Over a lifetime of 100 years, the brain thus consumes about 107 watt-hours, or 100 megawatt hours. So the energy resources required by the current OpenAI are a factor of 20 higher than required by the human brain for equivalent processing. And this is the case for OpenAI capabilities that fall far short of the general intelligence capabilities of the brain.
The conclusion is that even for the limited AI capabilities of current deep learning based systems, the physical and energy resources far exceed those utilized by the human brain. The difference must lie in the way brain resources are organized.
The second problem is the interference between new and prior learning
Interference between new and prior learning is widespread in deep learning networks. When a deep learning system has fully learned one type of task, if it is then trained to perform a second type of task the ability to perform the first task is reduced or even lost. Furthermore, even later learning related to the same task can interfere with earlier learning. For example, successful learning of a computer game can be disrupted by later learning of the same game. This type interference severely limits the ability to implement a general intelligence able to learn many different types of task. Such interference is much more limited in the case of the human brain.
The key issue is the architectural difference.
Deep learning systems have a very simple conceptual architecture made up of successive layers of perceptrons, each layer connected to the next layer. This is in sharp contrast with biological brains, in which a number of separate subsystems are observed, with obvious differences in device type and connectivity. The major subsystems of this architecture include the cortex, hippocampus, thalamus, basal forebrain, basal ganglia, amygdala, hypothalamus and cerebellum. It is striking that the same subsystems of this architecture are observed across all animal species, including mammals, birds, reptiles and even cephalopods like the octopus and squid.
Architecture that appears in human, mammal, reptile, bird and cephalapod brains
This remarkable architectural consistency has some analogies with the observation that almost every current electronic system has a similar high level physical architecture, with major subsystems including processing, memory, and a range of peripheral interfaces.
It has been argued that the biological architecture (sometimes called the recommendation architecture) and the ubiquitous computer architecture (sometime called the instruction architecture, or von Neumann architecture) are the only two architectures capable of performing complex combinations of different tasks without requiring unmanageably huge amounts of physical information processing resources. The difference between them is that the recommendation architecture is capable of learning from experience, while the functions of a system with the instruction architecture must be defined in detail by design. However, a learning system requires more information processing resources than a fully designed system.
The extensive design effort needed for a learning system
The architecture of a biological brain is very complex. At high level, this architecture is illustrated in the figure. In the cortex, neurons are organized to define and detect conditions with in information available to the brain. This information includes both current sensory inputs and information about the state of the brain itself. The hippocampal system manages cortical resources both to minimize the requirements for resources and to minimize interference between past and current learning. The basal ganglia gets inputs from the cortex indicating key condition detections, and interprets those inputs as recommendations to select and implement the most appropriate current behaviours. Behaviours include motor responses, but also include releases of information between different cortical areas, and changes to the recommendation weights of recently selected behaviours. Motor behaviours are implemented by release of motor cortex outputs to drive physical movements. Both types of information release behaviours are implemented be the thalamus. The amygdala and hypothalamus act on the cortex to influence the general type of behaviour which will be selected.
All these subsystems have neurons and patterns of internal connectivity to perform their different information processing functions, and the pattern of connectivity between them to allow the brain as a whole to operate. Unless all this neuron function and connectivity is heavily optimized, the brain will be unable to learn effectively. In other words, there must be detailed "design" to achieve an adequate starting point for learning. Of course, for the brain, this "design" consists of perhaps 300 million years of trial and error edited by natural selection. However, for any artificial general intelligence, human design skills must provide an alternative to this huge natural selection process.
How to construct an artificial general intelligence
Compared with biological brains, deep learning uses a simplistic architecture that is unable to avoid the related problems of huge resource requirements and catastrophic interference between new and prior learning. A genuine general intelligence would require design of the connectivity between a number of different subsystems, design of the subsystems to optimize their different information processes, and design of different types of devices within each subsystem to support those information processes. The most detailed devices could perhaps be simulated on a transistor based computer system, and the entire system therefore run on relatively standard computing technology, but there is no way to avoid the large human design effort to get to that point. Based on the design effort to implement the most complex regular electronic systems such as a telecommunications switch, this effort would probably involve thousands of man years of designer time.
It has been argued that the biological architecture (sometimes called the recommendation architecture) and the ubiquitous computer architecture (sometime called the instruction architecture, or von Neumann architecture) are the only two architectures capable of performing complex combinations of different tasks without requiring unmanageably huge amounts of physical information processing resources. The difference between them is that the recommendation architecture is capable of learning from experience, while the functions of a system with the instruction architecture must be defined in detail by design. However, a learning system requires more information processing resources than a fully designed system.
The extensive design effort needed for a learning system
The architecture of a biological brain is very complex. At high level, this architecture is illustrated in the figure. In the cortex, neurons are organized to define and detect conditions with in information available to the brain. This information includes both current sensory inputs and information about the state of the brain itself. The hippocampal system manages cortical resources both to minimize the requirements for resources and to minimize interference between past and current learning. The basal ganglia gets inputs from the cortex indicating key condition detections, and interprets those inputs as recommendations to select and implement the most appropriate current behaviours. Behaviours include motor responses, but also include releases of information between different cortical areas, and changes to the recommendation weights of recently selected behaviours. Motor behaviours are implemented by release of motor cortex outputs to drive physical movements. Both types of information release behaviours are implemented be the thalamus. The amygdala and hypothalamus act on the cortex to influence the general type of behaviour which will be selected.
All these subsystems have neurons and patterns of internal connectivity to perform their different information processing functions, and the pattern of connectivity between them to allow the brain as a whole to operate. Unless all this neuron function and connectivity is heavily optimized, the brain will be unable to learn effectively. In other words, there must be detailed "design" to achieve an adequate starting point for learning. Of course, for the brain, this "design" consists of perhaps 300 million years of trial and error edited by natural selection. However, for any artificial general intelligence, human design skills must provide an alternative to this huge natural selection process.
How to construct an artificial general intelligence
Compared with biological brains, deep learning uses a simplistic architecture that is unable to avoid the related problems of huge resource requirements and catastrophic interference between new and prior learning. A genuine general intelligence would require design of the connectivity between a number of different subsystems, design of the subsystems to optimize their different information processes, and design of different types of devices within each subsystem to support those information processes. The most detailed devices could perhaps be simulated on a transistor based computer system, and the entire system therefore run on relatively standard computing technology, but there is no way to avoid the large human design effort to get to that point. Based on the design effort to implement the most complex regular electronic systems such as a telecommunications switch, this effort would probably involve thousands of man years of designer time.