Asked if the race to achieve superhuman artificial intelligence (AI) was inevitable, Stuart Russell, UC Berkeley professor of computer science and leading expert on AI, says yes.
“The idea of intelligent machines is kind of irresistible,” he says, and the desire to make intelligent machines dates back thousands of years. Aristotle himself imagined a future in which “the plectrum could pluck itself” and “the loom could weave the cloth.” But the stakes of this future are incredibly high. As Russell told his audience during a talk he gave in London in 2013, “Success would be the biggest event in human history … and perhaps the last event in human history.”
The problem isn’t AI itself, but the way it’s designed. Algorithms are inherently Machiavellian; they will use any means to achieve their objective.
For better or worse, we’re drawing ever closer to that vision. Services like Google Maps and the recommendation engines that drive online shopping sites like Amazon may seem innocuous, but advanced versions of those same algorithms are enabling AI that is more nefarious. (Think doctored news videos and targeted political propaganda.)
AI devotees assure us that we will never be able to create machines with superhuman intelligence. But Russell, who runs Berkeley’s Center for Human-Compatible Artificial Intelligence and wrote Artificial Intelligence: A Modern Approach, the standard text on the subject, says we’re hurtling toward disaster. In his forthcoming book, Human Compatible: Artificial Intelligence and the Problem of Control, he compares AI optimists to the bus driver who, as he accelerates toward a cliff, assures the passengers they needn’t worry—he’ll run out of gas before they reach the precipice.
“I think this is just dishonest,” Russell says. “I don’t even believe that they believe it. It’s just a defensive maneuver to avoid having to think about the direction that they’re heading.”
The problem isn’t AI itself, but the way it’s designed. Algorithms are inherently Machiavellian; they will use any means to achieve their objective. With the wrong objective, Russell says, the consequences can be disastrous. “It’s bad engineering.”
Proposing a solution to AI’s fundamental “design error” is the goal of Professor Russell’s new book, which comes out in October. In advance of publication, we sat down to discuss the state of AI and how we can avoid plunging off the edge.
This conversation has been edited for length and clarity.
You’re hardly alone in sounding the alarm about artificial intelligence—I’m thinking of people like Elon Musk and Stephen Hawking. What’s fueling these fears?
The main issue is: What happens when machines become sufficiently intelligent that they’re difficult to control?
Anyone who’s ever tried to keep an octopus will tell you that they’re sufficiently smart that they’re really hard to keep in one place. They find ways of escaping, they can open doors, they can squeeze under doors, they can find their way around—because they’re smart. So if you make machines that are potentially more intelligent than us, then, a priori, it’s far from obvious how to control those machines and how to avoid consequences that are negative for human beings. That’s the nature of the problem.
You can draw an analogy to what would happen if a superior alien species landed on Earth. How would we control them? And the answer is: You wouldn’t. We’d be toast. In order to not be toast, we have to take advantage of the fact that this is not an alien species, but this is something that we design. So how do we design machines that are going to be more intelligent and more powerful than us in such a way that they never have any power over us?
Elon Musk uses very colorful language. It’s also true that Elon Musk and Stephen Hawking are not AI researchers. But I think to some extent that gives them a more objective view of this. They’re not defensive about AI, because that’s not their career. I think a lot of our researchers are defensive about it, and that causes them to try to come up with reasons not to pay attention to the risk.
You’re an AI researcher. Don’t you have that same defensive reaction?
To be honest, I did feel it initially when I first read what some people outside the field were saying.
I’ve been concerned about this general question for a long time, since I started at Berkeley—so that’s 30 years or more. What happens if we succeed? It seemed to me, no one in the field had any answer for that question.
The Hollywood version is always: “Well, the AI is going to destroy you, because it’s going to become conscious. And when it becomes conscious, it’s going to hate you. And when it hates you, then it’s going to destroy you.” That whole narrative is just nonsense. AI systems, as far as we know, are not going to be conscious. It also doesn’t make sense to say they would necessarily hate you just because they’re conscious. They’re made of software, and the software follows the rules. Whether or not it’s conscious doesn’t make any difference. It’s still executing one line of code after another. The consciousness is a complete red herring.
So, what is the real nightmare scenario, as you see it?
To some extent you can already see hints of what it looks like. Look at social media content-selection algorithms—the algorithms that decide what you see in your newsfeed, or what video YouTube recommends next. Those algorithms are very simple AI algorithms. Many of them are built as what we call “reinforcement learning” algorithms, which are trying to maximize some measurement of financial reward or click-through. And in order to do that, they learned by a very simple process how to manipulate human beings and change the kind of person you are so that you become more predictable in your clicking behavior. If you’re more predictable, they can make more money off you. They know exactly what you’re going to click on. So how do they make you predictable? Well, it turns out, by making you more extreme in your views. Many of these algorithms have turned hundreds of millions of people into more extreme versions of themselves.
The standard model of doing AI is wrong. If there were a way to maximize click-through by killing everyone, algorithms would kill everyone.
That’s a really simple example, where the algorithm isn’t particularly intelligent, but it has an impact on a global scale. The problem is that the objective of maximizing click-through is the wrong objective, because the company—Facebook or whatever—is not paying for the externality of destroying democracy. They think they’re maximizing profit, and nothing else matters. Meanwhile, from the point of view of the rest of us, our democracy is ruined, and our society falls apart, which is not what we want. So these AI systems that are maximizing, in a very single-minded way, the single objective, end up having these effects that are extremely harmful.
Now imagine: What if those algorithms were actually intelligent? If the apps had knowledge and opinions, and they could really manipulate you in a much more deliberate way. Then they could have completely catastrophic effects. They would create any distortion of human beings necessary to increase click-through.
The total dystopian end would be that society would collapse, that people would become completely distrustful of others, completely unwilling to help or comply with normal rules of behavior. I think, to some extent, that’s already happening. As you change the political complexion of the whole population, you increase the probability of war, for example. You change the willingness of the populace to accept cruel policies and laws and regulations, and so on.
If you had a system whose objective is to bring the level of carbon dioxide in the atmosphere back to the pre-industrial level—a climate-change-reversing objective—you’d be happy with that, right?
Yeah, I think so.
OK. But, you know, the best way to do that is to reduce the population.
Get rid of humans?
Get rid of humans.
So this is a problem of having the wrong objective. In the social media scenario, what would be the right objective?
One thing you might do is to say, “Maximize click-through, subject to the constraint that you are not modifying the opinions of the user.” And there are other algorithms, non-reinforcement-learning algorithms that adapt to the user’s preferences, but don’t modify the user’s preferences. And so that’s an important distinction.
By having AI systems doing everything, the cost of material goods becomes essentially free. So everyone in the world can have a very respectable standard of living in material terms.
In general, we have this problem of optimizing while ignoring things that are outside the explicit objective. When we give machines objectives, we are typically ignoring all kinds of other things that we actually care about, such as being alive.
Machines should have an overriding obligation to be of benefit to humans. If their fundamental objective is to be of benefit to humans, but they know that they don’t know what that means, then they’re going to ask questions. They’re going to ask permission before doing something that might impinge on something that we care about.
A traditional machine with a fixed objective is not going to let you switch it off, because then it won’t achieve the objective. Even if it’s Fetch the coffee, or Book me a hotel room, it’s going to figure out ways to prevent you from switching it off. Whereas if it’s uncertain about the objective, then it would understand that the human might want to switch it off to prevent something bad from happening.
What I’m really saying is that the standard model of doing AI is wrong. If there were a way to maximize click-through by killing everyone, algorithms would kill everyone. Fortunately, that doesn’t maximize click-through, so that hasn’t happened yet. But if [algorithms] could find a way, using the screen, to effectively re-engineer our minds to click whenever we were told to click, then that’s what they would do. And they’re just not smart enough yet. A more intelligent algorithm that really understood human neuropsychology could probably figure out a way to use the screen to literally control our behavior in a very direct and enforceable manner that we wouldn’t be able to resist.
These are really scary hypotheticals. But surely there are some benefits to AI?
The utopian scenario is that we achieve human-level AI in a way that is controllable. That is what we call “provably beneficial,” meaning that we can prove mathematically that systems designed in the right way will be beneficial to human beings.
Some problems are just unsolvable. It might well be that there are no wormholes. There is no warp drive. And we know that living forever just isn’t feasible.
One way to think about it is: What things in the world right now are easy, and what things are hard? What things are cheap, and what things are really expensive? In the book, I use the example of going to Australia. If this was 1800, and I said, “I need to be in Sydney,” then that would cost the equivalent of several billion dollars, it would take thousands of people, it would take four or five years, and I have about a 75 percent chance of dying. Now, I just take out my cell phone, go to United Airlines, tap, tap, tap, tap, tap, and I’m in Sydney tomorrow. And relatively speaking, it’s free. Travel has gone from being this very complicated, dangerous, expensive thing to a service. And if you have human-level AI, then it’s “Everything as a service,” all the things that currently are difficult. Like if I said, “I want to build a new house.” Right now, I can’t just go on my phone and say, tap, tap, tap, tap, tap, and have a new house. It’s expensive, takes a long time, you need to get architects, you need to get permits, then you will argue with the construction foreman, and it always costs three times as much. For a lot of people, it’s completely unaffordable—the cost of a house is more than their lifetime GDP.
Whereas AI could make these things similar to the way travel is right now. That being: easily accessible, very cheap.
What costs the money in the end is the involvement of human beings in the process. Even the raw materials are effectively free, except for the labor costs of extraction. By having AI systems doing everything, the cost of material goods becomes essentially free. So everyone in the world can have a very respectable standard of living in material terms. They can have access to high-quality education, to high-quality health care. And I think that would be fabulous.
Do you see potential in addressing existential challenges like climate change?
That’s obviously something that AI systems and computer modeling can help with—technical questions like, What could we actually do to reverse climate change? Could we absorb CO2? Could we reflect more sunlight [away from the planet]?
The last part is much more political. My own belief is right now, we could solve the problem, we just have decided not to. Collectively, for whatever reason, we’ve decided that we’re going to commit climate suicide. And that’s our own fault. Now, whether the AI systems could help with that, that’s another question.
There’s a temptation to say, “Whatever problem it is, the AI system will be intelligent enough to solve it. Yes, it will figure out how we can live forever. Yes, it will figure out faster-than-light transportation.”
What’s wrong with that temptation?
Because some problems are just unsolvable. It might well be that there are no wormholes. There is no warp drive. And we know that living forever just isn’t feasible.
Some people say, “We can upload our brains into silicon devices.” But there’s no guarantee that even if we knew what that meant—and we have absolutely no idea what it means, despite what all the movies seem to show—that you would actually have a continued existence. Your consciousness may or may not survive any such transition. I would say you’re taking a big, big risk.
In your new book, you talk about employment and how, as intelligent robots take over our manual jobs, we’re going to have to take on more high-level work, like art and psychotherapy. But is it possible that even those things might be taken over by AI, and humans could become irrelevant?
I think that’s possible. And we might need to actually regulate what kinds of functions AI systems could perform.
There are things that we just want other human beings to do. I want to be liked by other human beings, not by machines that like me because that’s how the program’s built. So if I’m going to have lunch with a friend, it’s because that’s another human being, and it’s of value to me because they’re human.
I think you’re right, this might change; we might come to a situation where, in fact, people would rather have lunch with the machine. Because the machine is better at making jokes or flattering your ego.
As AI takes over the jobs we don’t want, is it possible we’ll actually become better at being human?
No. Because if you just take the status quo and add AI to it, you get something like WALL-E, where people stop bothering about knowing anything or being able to do anything because machines know it, machines can do it. Why bother going to school? Why bother doing anything?
I’m sure there will always be people who are not happy being on the Barcalounger all day. But we have to make a real cultural effort to work against that tendency [toward laziness]. Because it’s sort of irreversible. Once we stop the whole process of learning and becoming capable and knowledgeable and skillful, it’s very hard to undo.