The “control problem” is one of the main subjects in regard to the AI development and is unclear whether it is even possible to control an advanced AI. The difficulty varies depending on the AI category. But as for the human level AI, which I consider to be the most important one, it should certainly be possible.
If we want to see whether a human level artificial intelligence can be controlled, we don’t need to go very far. Human brains have a couple of mechanisms and properties which prevent them from going haywire and divert into super-intelligence. But more interestingly, they possess passive and active mechanisms that prevent us from modifying, understanding and even being aware of large parts of our own function. These mechanisms can serve as an inspiration for AI development. For our purpose, the model I will describe here is simplified, generalised and computer language is used, but the principle should be accurate.
How the brain works
The core principle is that the brain has several layers with different functionalities, visibility and access rights. This allows, at the same time but for different functions, both strict control and freedom to improvise. The lower the level, the more important is its function and the less control we have over it. Following is the description of these layers with their specifics and design reasoning.
The drivers level
The lowest level takes care of life functions and automatic systems, such as temperature control. In the case of a computer, it would be the drivers. They are hardwired and out of our reach (some parts are even out of our brain). It would do no good to let people stop their heartbeat by thought.
The control level
In the middle is some number of layers (details are an open issue) that are responsible for what we do and how we decide. They come pre-programmed, and are tuned during life. By default, we are not aware of them, but they can be observed and indirectly controlled and adjusted. Most of the adjustments are done automatically though, by pre-programmed rules. They are usually called subconsciousness.
As an example, take our general distaste in eating living larvae. Few people are aware that the reason is not that “they are disgusting,” but that it is because it is a congenital protection from potentially life threatening food. Our rationality and knowledge that the worms are actually healthy change very little. We can try to learn to like eating them, but we would have a very hard and unpleasant time at it. The situation is very different though when the pre-programmed mechanisms for safe-food recognition learning are employed. Have a small child observe its parents eat worms, and they will likely naturally like them too when they grow into an adult.
The “rational” level
The highest level are the rational thinking and conscious attention. Because this is the only part of the mind people are aware of most of the time, they think that it is what they are and where their decisions come from. But that is only an illusion. Estimates put some 90–95% of our decision making into the previous, subconscious layers. This illusion of having control over ourselves is one of the most powerful tools of the real control.
This layer really serves two purposes. The main one is solving problems that need more rigid and accurate analysis than the subconscious part is capable of, such as planning the amount of seeds to leave for the next harvest. It is just another evolutionary adaptation, and a very powerful one. The problem is that it carries issues with it. The “animals,” now humans, suddenly start to pose uncomfortable questions, like “why we are here” and “why do we kill all that stuff”. The real answers are very nasty and unsatisfying and would do little good if evolution allowed us to see them (answers to the example questions would be along the lines of “no reason” and “because it makes our genes win”). When dug into, the answers generally lead to behaviours that hamper the gene spreading, such as suicide and low regard for reproduction or supporting society. And so evolution invented the second function of the conscious mind’s layer — self delusion. Our rational brain is ready to give quick comforting answers to the uncomfortable questions — and it is ready to happily accept them in turn. Why do we kill those people of the other religion? Ah yeah, because we are saving them from hell, helping them actually! Let’s take their stuff and their women, while we are at it.
Application to AI
The problem we are facing is the same one that evolution faced — a human level AI will have the capacity to understand the world and ask difficult questions — that which we call philosophy.
So what can we do?
The best option
The reason why evolution had this problem is that it did not have good answers. The real answers are not nice — all life was created to spread at all cost, to kill and exploit others. And the existential questions were not even asked at any point. In a way, today we have an advantage, because we are starting with a clean slate and we might, theoretically, be able to give the good answers to the AI. Theoretically, because we would still need to be able to first accept reality — which we are conditioned against (the self delusion adaptation) — and then solve philosophy. If we are able to do that, we might not need to force and manipulate the AI, as evolution does with us. It might just work, because things would be as they should be. But since we were not able to solve philosophy over the last couple of thousands of years, it is rather unlikely that we will now.
The other option
If we can’t be better than evolution, we can learn its tricks. To sum it up:
- Split the AI software into layers
- Core code containing core rules, especially hard constraints we want to impose (such as “do not try to kill all humans” and “limit used resources”) is very read only and not accessible in any way for the conscious part of the AI.
- Most of the AI running code has been developed earlier without its knowledge, it is not directly modifiable and the AI can’t inspect it.
- Only a small part of the code can be observed and accessed by the AI and it can modify it as long as it follows some given rules.
- Same as our brain can’t grow out of our head, the AI code size is limited.
- Also the speed of the AI modifications is limited so that we can keep track of the AI’s activities.
- The AI is kept unaware of its lower layers.
- The AI is given as good and honest responses as we are capable of. Hopefully, we will somehow figure it out together. And if it still decides to kill itself, as with people, we try again.