Published Date : 27/01/2025
In 1945, polymath John von Neumann laid out the blueprint for modern digital computers.
Interestingly, the only reference in his 49-page report was to a foundational paper in computational neuroscience “A logical calculus of the ideas immanent in nervous activity.” Von Neumann recognized the differences between the brain and the computers he helped develop, but the brain served as an inspiration.
He believed the nervous system functioned in a way that could be considered “prima facie digital.” Despite initial similarities, the fields of computer science and neuroscience diverged significantly, and the same is happening with artificial intelligence (AI) and neuroscience.
From the beginning, AI and neuroscience have been closely linked, with natural intelligence serving as a model for AI.
Many AI approaches, such as artificial neural networks (ANNs), draw from foundational neuroscientific principles.
For instance, information in the brain is stored in the weights of connections between neurons, a concept that has been adapted in ANNs.
Other principles, like convolutional neural networks (visual cortex), regularization (homeostatic plasticity), max pooling (lateral inhibition), dropout (synaptic failure), and reinforcement learning, have also been inspired by the brain.
However, recent advancements in AI have moved away from neuroscientific principles.
A decade ago, recurrent neural networks (RNNs) seemed to be the way forward for time-dependent tasks like speech recognition and natural language processing.
But this changed with the introduction of transformers in 2017, as described in the groundbreaking paper “Attention is all you need.” Transformers are powerful and innovative, yet they differ significantly from the brain.
They lack the recurrent connections of RNNs and operate in discrete time steps, without any memory of previous states.
They also lack working memory, instead externalizing it by iteratively increasing input length.
Notably, transformers have no internal dynamics or ability to tell time.
For example, ChatGPT cannot respond appropriately to time-based prompts without external programming.
The brain encodes time and recent sensory information through dynamic internal processes, including short-term synaptic plasticity.
In contrast, transformers use positional encoding, which tags words or tokens with positional information (first, second, etc.).
This approach helps solve issues like exploding or vanishing gradients, where error signals degrade during backpropagation.
Transformers operate in a block universe, where past, present, and future are simultaneously accessible, unlike RNNs, which function in a presentist universe with continuous time.
The attention mechanism in transformers, despite its biological-sounding name, is quite different from what cognitive neuroscientists consider attention.
It assigns values to the strength of relationships between word pairs, rather than modulating information based on expectations or volition.
This mechanism is mathematically complex and difficult to implement with biological neurons, as it involves multiplying activity vectors, an operation that is awkward for neurons.
Despite their success, transformers have limitations, including high energy consumption.
This has led the AI field to reevaluate RNN-like approaches, such as long short-term memory (LSTMs), gated recurrent units (GRUs), and Mamba.
However, these architectures often lack the biological realism of neural circuits, partly because they are implemented on digital computers that can perform a wide range of mathematical operations.
As long as AIs run on digital computers, their development will be influenced by Moore's Law, while neuroscience will progress more gradually.
The hardware on which AIs run is crucial for understanding their potential for sentience.
Digital computers operate in discrete time, unlike the brain, which can be paused or have its clock speed changed.
If an AI simulation is claimed to be conscious, slowing the clock speed to one cycle per year would freeze it in a subjective state.
Most consciousness theories, like global workspace and higher-order theories, assume that consciousness is associated with continuous brain dynamics, similar to music flowing through time.
Depending on the hardware (CPU, GPU, or TPU) and the number of cores, all states within a single time step of an AI may not update simultaneously, affecting any potential conscious states.
An exception is the controversial integrated-information theory (IIT), which quantifies how much the current state of a system constrains past and future states.
IIT claims this quantity is directly equivalent to consciousness.
However, IIT is defined only for discrete systems, making it difficult to apply to continuous physical systems, especially the brain.
Therefore, current theories of consciousness, which align with the brain as a dynamical system, suggest that sentience is unlikely in AIs running on discrete von Neumann architectures.
AI and neuroscience will continue to interact synergistically.
AI will keep borrowing insights from neuroscience, but AI may have more to offer neuroscience in the future.
Neuroscientists have been slow to fully integrate early lessons from AI, such as the limited value of a complete connectome.
Every connection, weight, and bias of ChatGPT is known, yet this knowledge has not led to a deep understanding of its workings.
This suggests that neuroscientists need to reevaluate what it means to understand the emergent properties of complex, distributed systems like the brain.
Computer science has evolved independently of neuroscience because the brain has no exclusive rights on information processing.
Similarly, AI and neuroscience will continue to diverge because the brain has no exclusive rights on creating intelligence.
Q: What is the main difference between transformers and recurrent neural networks (RNNs)?
A: Transformers lack recurrent connections and operate in discrete time steps without any memory of previous states, while RNNs function in continuous time with internal memory.
Q: How do transformers handle time and sequence information?
A: Transformers use positional encoding to tag words or tokens with positional information, allowing them to handle sequence information effectively.
Q: Why is the attention mechanism in transformers considered non-biological?
A: The attention mechanism in transformers assigns values to the strength of relationships between word pairs, unlike biological attention, which modulates information based on expectations or volition.
Q: What are the limitations of transformers despite their success?
A: Transformers have high energy consumption and lack internal dynamics and the ability to tell time, which can limit their applications in certain contexts.
Q: Can AIs running on conventional computers achieve sentience according to current theories of consciousness?
A: Most theories of consciousness suggest that sentience is unlikely in AIs running on discrete von Neumann architectures because they lack the continuous-time dynamics of the brain.