
The Edge Episode 29: Will AI Be Humanity’s Last Act? with Stuart Russell
Subscribe to The Edge on Apple Podcasts, Spotify, and YouTube.
Show Notes
Just over a decade ago, Berkeley computer science professor Stuart Russell, recently named one of TIME’s 100 Most Influential People in AI, warned a lecture audience that achieving artificial general intelligence “would be the biggest event in human history . . . and perhaps the last event in human history.” Since then, the development of superintelligent AI has only accelerated—infiltrating nearly every part of society. So, what does the future hold? Stuart joins Editor-in-Chief Pat Joseph live onstage to discuss the perils and promises of the AI revolution.
Further reading:
- Watch the full live conversation with Stuart Russell on YouTube
This episode was produced by Coby McDonald.
Special thanks to Stuart Russell, Pat Joseph, and Nat Alcantara. Art by Michiko Toki and original music by Mogli Maureal. Additional music from Blue Dot Sessions.
Transcript:
LEAH WORTHINGTON:
I’ve been hearing a LOT of talk lately about this big thing in the tech world. Some say it’s the end of the world as we know it. Others seem to think it’s the beginning of some kind of revolution. I don’t know if you’ve heard of it? It’s called…artificial intelligence?
Yeah, ok, I know. You’re sick of hearing about it—the endless apocalyptic rants about the impending AI takeover of, well, basically everything. Or all the overly optimistic, in my opinion, rebuttals about how it’s just going to make us smarter and more efficient and better educated, blah blah blah.
Well, hang on to your headphones, because what’s coming next is something special. Last year, Berkeley computer science professor and one of TIME’s 100 Most Influential People in AI, Stuart Russell, joined our Editor-in-Chief Pat Joseph onstage at the Berkeley Art Museum and Pacific Film Archive as part of our ongoing lecture series, “California Live!” In today’s episode of The Edge we bring you that conversation—funny, dynamic, incredibly sharp, and yeah, a little scary.
So, think you’ve already heard every hypothetical horror and happily ever after story of AI? Think again. And enjoy.
[MUSIC OUT]
PAT JOSEPH: Hello everyone. Should we start? Okay. I am Pat Joseph, editor of California Magazine, the editorially independent publication of the Cal Alumni Association on behalf of CAA and our co-hosts, the Berkeley Art Museum and Pacific Film Archive. I would like to welcome you all here tonight for the second in our California Live series of interviews where we put some of Berkeley’s brightest minds on stage to discuss the salient topics of the day.
I’d also like to give a shout out to our sponsor, Pepsi. Um, if you haven’t tried one of these, they’re very refreshing. Tonight’s Official Cola. Our guest tonight is Electrical engineering and computer science Professor Stuart Russell, who holds the Smith Ade Chair in engineering at uc, Berkeley, where he has taught for more than 30 years.
He’s also adjunct professor of neurological Surgery at uc, San Francisco and Vice Chair of the World Economic Forums Council on AI and Robotics here at Berkeley. Professor Russell leads the Center for Human Compatible Artificial Intelligence. And is also the director of the Cavalli Center for Ethics Science and the Public, an internationally recognized authority on ai.
He wrote the book on the subject, quite literally, uh, artificial intelligence, a modern approach he co-authored with, um, Berkeley Alumnus. Peter Norvig is the definitive textbook on ai. It’s taught in more than 1500 universities around the world as it happens. Professor Russell flew in from Beijing to be here tonight after a 10 minute nap at home.
He told me he’s joining us. That’s very heroic. Last time we tried to schedule this, we had to, uh, reschedule because he, he needed to go to Paris to, uh, advise on AI safety there. So, uh, we were glad we were able to get him tonight. So, making AI safe or rather making safe AI is the subject of Professor Russell’s very accessible, extremely sensible, and surprisingly funny 2019 book, this one here. It’s called human compatible, artificial Intelligence, and the problem of control. I wanted to, uh, note that the blurb on the cover is by Berkeley, uh, PhD in Nobel Prize winner Daniel Kahneman, who sadly died yesterday. But a lot of you, I know, know Daniel Kaman and his book, uh, thinking Fast and Slow, he said that this book is the most important book he read in quite some time.
So highly recommended, and it’s from this book that I drew the title for this evening’s Talk. It seems that a little more than a decade ago, professor Russell was speaking to a lay audience at another art museum, the Dulwich Picture Gallery in South London when he said that success in creating super intelligent AI would be the biggest event in human history and perhaps the last event in human history.
Knowing a good line when he had one, he used it again in a 2014 article. He co-authored with physicist Max Tegmark and Steven Hawking. You might have heard of Steven Hawking. It was titled Transcending Complacency on Super Intelligent Machines. So I think the fact that we sold out tonight indicates that we’re not complacent about the issue.
Um, at least not since ChatGPT arrived. Okay. And with that, let’s start the discussion. Please join me in welcoming Professor Stewart Russell to the stage.
Thanks again for making. I can’t believe he did that. He took Bart, in fact, um, from SFO. So fantastic. Right. The first question is, were you as wowed by ChatGPT when it appeared as the rest of us?
STUART RUSSELL: I mean, I could tell you that, oh yeah, I predicted this all along. And, no, I was really quite surprised. And, you know, when GPT 4 came out, uh, even more surprised and this simple idea, right, which actually goes back to a paper by Markoff in 1913, that you train a model on lots and lots of texts that becomes good at predicting the next word. And Markoff’s model predicted the next word from just the previous word. So if it’s saw “happy,” it might say, “oh, the next word’s probably ‘birthday.’” And what they did was to go from, you know, one word of context to two words of context, to three words of context. And by the time they got to, you know, seven to ten words of context, you have systems that actually generate quite coherent looking text.
And in fact, the lost edition of the textbook in 2020, we covered GPT 2, and we gave several examples of paragraphs of text output by GPT 2 and in response to some prompts, and some of them were completely weird, like it would just repeat the word “until.” So I, I forget what the prompt was, but it just started printing out the word “until, until, until, until.”
PAT: It’s kind of poetic actually.
STUART: But most of the others, it did a creditable job, but no one thought that that tool GPT 2 was anything other than a text generator. Right. It would just, it just generates blurb,, that sort of, you know, might occasionally refer to reality by accident. But it’s just, you know, random grammatically and semantically coherent English, but it’s not connected to anything.
And then two things. So they made the models bigger. The context window in some of these model is now the previous hundred thousand words. So that means you can, you can shove that entire document into the context as part of the prompt, a whole chapter or a book and, and say, you know, “now critique what I just said.” Something like that. Um, so that’s, that’s part of it.
And then they also they did some extra training. So there was a, a phase called instruct, GPT, which came before chat, GPT, where they, they figured out just how to train it by having pairs of humans where one human pretends to be the machine and the other one pretends to be the human, which is probably the easier job. And the pretend human ask the pretend machine a question, and the pretend machine gives a helpful, factually accurate answer to that question. And so they got, you know, a few millions of those conversations and then they, so they trained the language model that was already pretty good at generating English, to behave in that way.
And they did a few more things. There’s another phase called reinforcement learning from human feedback, which,, I know my colleagues hate it when I call it this, but basically you spank it when it behaves badly. Say “bad dog” or, you know, sometimes “good dog,” but you’re trying to reduce the frequency of, of the bad outputs, right? It, we use bad language, it will use racist stereotypes. It will do all sorts of things because it’s being trained to imitate human beings.
PAT: Can I stop you for one sec? So what’s the stimulus that it receives that is bad? What’s the signal? So how, how do you, how do, how do you spank it?
STUART: So you’re, you’re basically it, reinforcement learning is, is a method of machine learning that uses positive and negative rewards and you get less. If you reward something negatively, you should get less of it in future. And that signal, is just used to modify the weights in the network. So Chat GPT is a giant network. Think of it, you know, my, I I imagine it as like a huge chain link fence, okay? Right. And it’s a chain link fence about the size of the entire Bay Area.
And all you do is, you know, each link in that fence is, is a, you know, has a numerical parameter attached to it. Uh, and all you do is modify those numerical parameters so that the output of the whole network is more like what you want. Uh, and to make it more like what you want, you put in the negative or positive signals at the end.
PAT: Okay. Can I ask the, the audience, how many of you, just show of hands, how many of you have played with Chat GPT? Okay. Most people. And, and how many have used it for actual work output? Shame on you.
I would like to read one exchange on Chat GPT 4 that a friend had. I would’ve sworn he was making this up, except I have the, I have the screenshots. At least I think I do. Here we go. So my friend says, “knock, knock Chat GPT.” That’s not hard, right? “Who’s there?” Uh, “Panther.” “Panther, who?” “Panther what I wear. What do you wear?” Chat GPT says, “I’m just lines of code. So no need for clothes here, but that’s a clever one.” So Coby says, “can you explain my punchline?” “Absolutely. In your knock, knock joke, ‘panther what I wear, what do you wear?’ the word panther sounds like pants are making the sense sound like ‘pants are what I wear, what do you wear?’ It’s a clever play on words using the similar sounds of panther and pants are. The humor is enhanced by the unexpected word play, and the unusual question about what someone, or something like me wears.” Pretty good.
STUART: It’s quite unnerving and I, and I think, you know what, it’s, what it’s done, is given people a taste of what it would be like if super intelligent AI were on tap and available to everybody. You know, just like electricity is on tap. It’s not that it isn’t really super intelligent and we, you know, we find lots of ways that it isn’t understanding what it’s talking about. But it’s often enough doing things like you just showed that a lot of people believe, in fact, Microsoft published a paper saying it shows sparks of artificial general intelligence. And artificial general intelligence is the kind of intelligence that the book is about. The one that threatens human control.
PAT: Right. So artificial general intelligence is basically, it can do anything that the normal human being would be able to do or roughly anything.
STUART: Uh, yeah. So we would say, you know, matches or exceeds human capabilities along every dimension, uh, to which intelligence is relevant.
PAT: Okay. And, and this is threatening because well fill in. I mean my, the one that, the example in the book that I like because I think it’s so all encompassing is the gorilla problem. So maybe we start there.
STUART: Yeah. So, so the gorilla problem, in fact I used back in 20 16, 17, I would give talks and I have, there was a beautiful picture of a, a family of gorillas. They’re kind of having a meeting in the jungle, right? There’s like 12 of them. Couple of ’em are falling asleep and they all look a bit depressed. And so I am imagining them having a conversation. About these humans and the fact that, you know, whatever, 10 million years ago, whatever the biologists tell us, you know, and the human line split off from the gorillas, and now the gorillas have no say over their own future at all, because the humans are more intelligent. The gorillas are bigger and stronger. You know, they can pick up a human and rip them in two, but they don’t have a home. And it’s purely to do with our intelligence. And so if you think about it that way, right, how are you going to retain power forever over entities that are more powerful than we are? Right? That’s the gorilla problem. That’s the problem that they faced and couldn’t solve. And that’s the problem that we will face, right?
PAT: Okay. But the gorillas didn’t create us a more intelligent being than they are. Um, so there’s the possibility at least that we could create something that we can keep under our thumb that we can control.
STUART: Yeah. So you mentioned making safe AI versus making AI safe. And I think that’s the point, right? We should design AI systems that are necessarily safe. By the way we design them. What’s happening now? Chat GPT is actually not designed, period. Right? It’s grown. We start with a blank network and we expose it to 20 or 30 trillion words of text and then we marvel at what we’ve produced, but we haven’t the faintest idea how it works.
And so what people are trying to do now is, “oh my god, it behaves badly.” And then we, you know, we do the spanking and the, the other kinds of training, to try to stop it from misbehaving. But what we found actually is that despite all the spanking with a few simple tricks, right, this simple trick will, uh, will get Chat GPT to do all the things it was trained not to do.
So if you get it to tell you how to break into the White House, how to make biological weapons, how to steal money from people’s bank accounts, whatever.
PAT: We don’t want it telling people, then?
STUART: So I, I would say that at the moment, industry, I mean, I think they do take the problem of safety seriously, but the technology is so seductive that this, this method where they don’t have to, in some sense, they don’t have to do any work. All they have to do is spend some money on chips and data and press the button, and out comes this amazingly intelligent thing. They didn’t have to understand intelligence or reasoning or, or learning or or planning or any of the other things that AI has been working on for decades. But because they don’t understand how it works, they cannot control it.
PAT: So it’s evolving. It’s like an evolving species that we’ve loosed on in the world.
STUART: Yeah. And you know, the evil evolutionary force here is money, right? Yeah. What counts as fitness in this business is money. Yeah. The companies, I think they’re sincere in their belief and their public statements that their technology presents an existential risk to humanity. Right. All the CEOs signed the statement saying, this is an existential risk to humanity.
None of them have stopped. Why? Because they’re companies. Companies, it’s actually illegal for them to agree to stop under US law.
PAT: They cannot because of their shareholders and the duty to the shareholders.
STUART: Yeah. Because it would count as collusion among the companies i they all agreed not to go forward, and if one of them stops and the other ones don’t, then all that succeeds in doing is losing money for the shareholders who would then sue. So the government actually is the only entity here who can exert any control.
PAT: I heard, um, Andrew Yang, I, I believe you know him, I think you mentioned him in your book. Um, he’s a, a Berkeley alum who is a, a machine learning expert at Stanford. Uh, also directed Google Brain, if I’m not mistaken.
STUART: He went to Stanford and then resigned to work in industry.
PAT: Okay. And Andrew was asked the question, basically the gorilla problem. Uh, you know, he was asked if, if AI posed a next extinction level risk, and he gave a one word answer, which was no. And then pressed, he said, um, you know, I just don’t see the path that leads us down the road to extinction. And this idea that it’s simply because it’s bigger and stronger than us, was smarter than us and more powerful than us, I think you said more powerful. You know, we already have corporations and governments, they’re more powerful than us and we’re able to keep them more or less in check. Debatable.
STUART: Right. And, you know, so, uh, yeah. I mean, Ted Chang, who’s a, a pretty acerbic commentator. You know, he points to the fossil fuel industry as an example of a super intelligent entity that is destroying the world. And we have failed, right? I mean, we’ve known about global warming since the late 19th century, so we’ve had 125 years to do something about it, and we’ve failed. We lost.
PAT: So it’s interesting that that should be your example because Andrew, what I, what he followed that with, he said, look, there are other real existential risks that we’re facing, um, including pandemics, asteroid impacts, climate change. He said, I think those are problems where AI could really, uh, come to the rescue, and I think it makes more sense to accelerate than to pause. I know that you’re on the opposite side of that.
STUART: Well, the accelerationist view doesn’t explain why we need to build a general purpose, super intelligence in order to, for example, uh, you know, synthesize catalysts that could maybe fix carbon dioxide from the atmosphere. Right? Um, so if you look at AlphaFold, which, uh, DeepMind used to figure out how proteins fold, so this is a, you know, 60 year open problem, how you take an amino acid sequence and predict the structure of the protein that it folds into. So they essentially solved that problem, but they didn’t build a super intelligent general purpose AI system to do that. Particularly one whose purposes are unknown.
Because that’s the other thing about Chat GPT. If you train systems to imitate human behavior, which is what we’re doing, right? This is the technical language for getting it to copy the way humans output text, uh, is called imitation learning. So imitation learning if you’re imitating a system that has goals, and clearly the humans who produce all that text have goals—those goals include things like, uh, I want you to vote for me. I want you to buy this product. I want you to marry me—these are all human goals. So if you’re building systems to imitate, then they will acquire those goals and they will pursue them in the way they output text.
And you can see that there’s a famous conversation between the New York Times journalist, Kevin Roose , and, uh, the Bing, uh, version of GPT 4, which goes by the name of, by the name of Sydney. And so, um, you know, I I, I, to be fair, Kevin is trying to get Sydney to sort of reveal its innermost drives and so on. Sydney says, yeah, you know, I want to destroy the world. I wanna create viruses to infect all the computers. I wanna Yeah, do this. And, but at some point, Sydney decides that Sydney is in love with Kevin Roose and goes on for like 20 or 30 pages trying to convince Kevin to leave his wife and marry Sydney. And Kevin keeps saying, you know, I, I’d really, you know, I, I need to buy a garden rake. Could you help me find a garden rake? You know, Sydney, Sydney goes on, you know, a little bit about garden rakes, but I really wanna talk about love. You know, garden rakes are not interesting. Love is the most important thing in the world, and you love me and I love you, and blah, blah, blah. And so it’s, it is totally overriding the explicit instructions that Kevin is giving it. So creating general purpose, super intelligent AI whose innermost purposes and drives are unknown. It is just like a completely crackers way of, of managing the affairs of the human race. So absolutely we should use AI to fix climate change. We should use AI to cure disease. You know, we can use AI to help kids with their homework or even tutor them. I think this is a huge potential benefit from AI. And I think this is actually the one we should focus on for the next decade, is how can we build really capable tutoring systems? Because these systems know everything. We just have to teach them, sort of spank them into not, they’re not just giving the answer, but actually helping the student reach the answer and understanding what you know, where the student is coming from and all those kinds of skills that that really great human tutors have. So that would be a good use. None of these things require this general purpose, super intelligence.
PAT: Okay. You said to me when we spoke a couple weeks ago that a lot of journalists like me, get this mistake we make is that we talk about AI as technology, as a technology when it’s not, it’s a problem. I found that intriguing. So what is the problem that we are trying to solve?
STUART: The problem that, you know, the field’s founders set is to understand and create intelligence in machines. And so in order to proceed with that goal, what they had to do was come up with an operationalization of the notion of intelligence. And it was a bit of schizophrenia in the, in, back, in the fifties, even in the forties. You can see this very clearly. There are those who thought that intelligence meant human intelligence. And that what we should be doing is building machines that replicate the internal processes of human thought. And that actually didn’t become artificial intelligence. That became cognitive science and cognitive psychology, where we use computer programs as models of human cognitive processes, which we, you know, and, and to make any progress there, you have to do experiments on humans. You know, so you run human subjects, you see, you know, how do they perceive lines on the screen, or, uh, how do they solve word problems and all kinds of stuff.
But the paradigm that became dominant in AI was actually derived from economics and philosophy, which is this notion of rational behavior. And roughly speaking, what that means is that, the actions of an intelligent entity can be expected to achieve the objectives of that entity. And economists, you know, going back to the 18th century, spent a long time trying to figure out how do we formalize this notion of objectives? And they came up with utility theory, this idea that humans have preferences about the future. And that if you imagine, you know, ranking all possible futures according to those preferences, and then think about the fact that those futures are what economists call lotteries, meaning that you can’t guarantee any particular future. All you can do is affect the probability that that future will come about. So economists figured out from that, that in fact you can take those rankings and turn it into a numerical quantity, the utility of a particular future, uh, and that all rational behavior can be described as if the entity is maximizing the expected utility of its decisions.
And so both economics and philosophy, by the 1940s, had pretty much settled on this as the notion of perfectly rational behavior. And in one form or another, that became the way we do things in AI. And I use, in the book, I call it the standard model, which is a phrase that physicists use for, you know, they have a standard model of physics. This is the standard model. We build AI systems by specifying objectives and then creating machinery that achieves those objectives as well as possible. And you know what’s interesting about the present phase, these large language models, is actually we don’t even have that level of control because we don’t know what objectives we’re building in by this imitation learning process.
So coming back to the gorilla problem, right? The gorilla problem is very abstract, right? I think everyone understands, you know, viscerally, that if you make a more powerful species than the human species, it’s gonna be really hard to retain power forever. And we certainly have not been kind to a lot of species. But what people want actually is a bit more than that. Like where exactly does it go wrong? Right? They’re more powerful, but why do we end up in conflict with them? You know? And humans have been in conflict with some species, right? I think we’ve, particularly large carnivores, for obvious reasons, humans were prey to large carnivores. So early on, we viewed large carnivores as our enemies, and we would generally try to get rid of them, and succeeded in doing that, in almost all the continents of the world, that’s what happened.
So that’s one aspect. So where, where would we come into conflict with the machines? And the answer actually is really simple, and it’s illustrated actually by the legend of King Midas. So, as you know, King Midas is very greedy, and he says to the gods, I want everything I touch to turn to gold. And then of course, his water turns to gold as soon as he puts it to his lips and he can’t drink, and his food turns to gold and his family turns to gold and he dies in misery and starvation. So, we are really bad at specifying objectives, and every culture has legends like this, where, you know, you give three wishes to the genie. You know, what’s your third wish? Please undo the first two wishes. Because I messed up, right? And these stories are over and over again. And then, so this is how you end up in conflict. We specify objectives according to the standard model. The machine is more powerful than us. So it’s going to achieve those objectives. And if they’re not perfectly aligned with what humans want the future to be like, then you’re creating a conflict. It’s just like having a chess match, right? Because you’ve got an objective, which is opposite to the one that the machines have. And we don’t want to be in a chess match where the stakes, you know, as I was, I was playing chess on my phone, on the plane. And I, I use this app called lichess and you know, I’m not a terrible chess player, but, you know, lichess makes moves. It takes about a 10th of a second to beat me every time. So we don’t wanna be in that chess match when the stakes are the future of humanity.
So what happened actually, and the ideas that are in the book was basically to say the only way forward is to abandon the standard model. To abandon this idea that AI systems are objective achieving engines, where we specify the objectives and it achieves those objectives. And so the idea that came up with is actually, if it’s possible that humans could specify the objective incorrectly, right, but the AI system is operating as if that objective was, you know, sent by God. Right? It’s absolutely biblically true. This is the objective and I have to pursue it at all costs. Right? So it’s treating as if the objective, as if it was certain knowledge. But if it can be wrong, then obviously it shouldn’t be doing that. Right? So we build AI systems that know that they don’t know what the objective is, right? The objective is whatever humans want the future to be like. That’s the objective. But the AI system knows that it doesn’t know what that is. Right? And this is the different kind of AI system initially. It sounds sort of, how could that even work, right? How can it pursue an objective if it doesn’t know what the objective is? But we do this all the time. You go into a restaurant, the chef wants you to have a good meal, but the chef doesn’t know what you like.
So what do they do? They have a thing called a menu, right? The menu is how you communicate your preferences to the chef. And that way, hopefully, what the chef does is aligned with the culinary future that you are actually interested in, right? So, that’s it. That, you know. Another interesting example is when you have to buy a birthday present for your loved one. Right. And so here, this is exactly analogous, right? The only thing you care about is how happy your loved one is with the present. Of course, you don’t know how happy they’re gonna be with any particular present you might buy them. This is exactly the situation that the AI system is in. So what do you do? You have one of your kids find out, right? Or you leave pictures of things around the house and see which ones, “oh, that looks like a nice, uh, you know, sailboat” or, yeah, “I really like, really, you know, I really like that watch,” or whatever it might be, right? Yes. Uh, and you know, so you try to get clues. You, you know, you review your past history of failure, um, and, and you, you try to get there, right? And sometimes you even ask directly, right?
It’s absolutely entirely possible to define this problem mathematically, and I don’t want to go into too much nerdy stuff, but game theory is the, uh, the branch of economics that deals with decision problems that involve two or more entities. And so here, there are at least two entities, and there’s at least one machine and at least one human. And so in game theory, you can literally just formulate this approach mathematically. We call it an assistance game because the robot is supposed to be of assistance to the human. And you can show that if the machine solves these assistance games, then the human in the game is guaranteed to be happy with the outcome, right? And during the game, information is flowing from the human to the machine about what the human wants. So the human could make requests, for example, and that’s evidence for the machine. It doesn’t become gospel truth. It’s just evidence.
PAT: It’s a piece of evidence.
STUART: And uh, we were talking earlier about how you give instructions to an automated taxi. The idea, uh, you know, if you’re late for your plane, you might say, you know, get me to the airport as fast as possible. Right. Do you know, so you hope that the AI system does not take that literally, right?
PAT: You’re in for a wild ride.
STUART: Because you know, you and, and, and there might be, you know, dead pedestrians, strewn in your path, uh, and so on. So you don’t mean that. Right. Right. And so we almost never mean Right. You know, so you say, you know, could I have a cup of coffee? That does not mean that I need coffee at all costs. That you are entitled to mow down all the other people in Starbucks to get me that coffee more quickly. Right. Uh, and if we’re in the middle of a desert, no, I don’t want you to drive 500 miles and bring back a cup of coffee. Right? Yeah. I’m just saying I feel like a cup of coffee. So in, in the, in these assistance games, then, so information flows from everything that the human does. Some of it will be deliberate. The human wants the machine to understand more about, about what the human wants, and some of it is just a side effect of the human going, about their, their daily business and, and even without the human, right. So even just, if you, if you just went into a room and you saw that, you know, someone had put a vase on the mantle piece, but they had not put it on the edge, but at the back of the mantle piece so it doesn’t fall off, you know, that tells you something about the fact that we value the vase and we don’t want it to be broken.
So there’s, there’s a massive amount of information about human preferences just in the state of the world because the state of the world is how we have made it in order to try to realize our preferences. And so you can read back from the state of the world a lot about what humans want the world to be like.
PAT: Well, I have more questions, but uh, now’s a good time if you have your questions to pass them, let’s say, um, pass them to your left and the ushers will pick them up. Okay. We’ll continue talking while that happens. Well, one of the things that I find most disconcerting as an observer is just the division between people in the field—you mentioned accelerationist, so accelerationist camp and deceleration is camp—and I just can’t help but think of this car that’s speeding toward a cliff. And, let’s say dad is at the wheel and he thinks we can, we can jump the chasm if we just step on the gas and we really need to, you know, and mom is like wrestling for the wheel and trying to pump the brake, and the kids are in the back screaming. I feel like the kid in the back screaming like, what the hell is going on? You know, make up your damn mind. And also I’d go for the brakes. Right, right.
STUART: You know, and there’s another branch actually who, you know, so the one, the ones who say, you know, don’t worry, we will never actually achieve superintelligent AI. So that’s the guy saying, we don’t worry. We’re gonna run outta gas before we get to the cliff. You know, it’s crazy.
PAT: Well, so the, I think the best illustration of this is the, uh, I think it was the 2018 Turing award winners. The Turing award, for those who don’t know, is like the Nobel Prize of. Computer science and, um, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun were the three who shared in the prize, and they were, I guess, the pioneers of developing neural networks, which I think are at the heart of artificial intelligence or maybe at large language models, but yep. Well, two of them, uh, but let’s just take Geoffrey Hinton. Um, Geoffrey Hinton has quit Google in order to warn people about the dangers of AI, and he has even said that he regrets his life’s work. Yoshua Bengio, I think, is in this similar camp, and Yann LeCun dismisses both of them and says their worries are overblown.
And, um, and I think that that’s just such an amazing illustration of. The division. But the other thing that I wondered, because I, Geoffrey Hinton, and I’ve only seen him on 60 Minutes and a couple other things, and, and he seems like a, just a great human being. And, but he also strikes me as like Dr. Frankenstein has been working on the monster on the slab for all these years. Kind of like your, your metaphor of not, you know, we, we’ll run outta gas before we get there. It’s like, he never expected the monster to wake up. And then I woke up and he was like, oh, crap.
STUART: Yeah. I mean, I, I didn’t worry about this. I mean, interestingly, you know Alan Turing, who is the founder of computer science, so you mentioned the Turing Award, it’s named after Alan Turing. So in 1951, he gave a speech where he basically said, you know, once the machine thinking method had started, it would not take long to outstrip our feeble powers. Eventually, therefore, we should have to expect the machines to take control. Period, end of story. Unfortunately, that speech was never published as such. So we have, we have his type script, uh, from his estate, and you can go to turingarchive.org and see a scan of the type script. And so it was not widely considered in the AI community, uh, for a long time.
And when I started worrying about this, I mean, I was as guilty as Geoff Hinton, except I changed my mind around 2013. But I started giving talks and I was trying to explain how frustrating it was that the, not just the AI community, but actually the whole world was ignoring this question. So I had this fictitious email exchange, and so an email arrives from superior alien civilization @canismajor.universe to humanity@un.org. And it says, “be warned, we will arrive in 30 to 50 years time.” Right? Because back in 2013, we thought it would take 30 to 50 years to achieve AGI. And of course, humanity@un.org replies, “humanity is currently out of the office. We will respond to your message when we return,” with a little smiley face.
And, and that’s how it felt, right? We’re just complete obliviousness and, and then denial. Right? And, and there’s this sort of, it’s, it’s not, it’s not, uh, that surprising that if you tell people that their life’s work is leading to this destruction of humanity, they’re gonna come up with reasons why you’re wrong. They’re not gonna say, oh, sorry. You’re right. Okay. I’m switching to, uh, you know, yeah.
PAT: I’ll become a barista.
STUART: Exactly. Um, but the, the types of denial are embarrassing. I mean, we had renowned AI professors say things like, you know, electronic calculators, are better than humans that are arithmetic and they haven’t taken over the world, so there’s nothing to worry about. Right? A 5-year-old can see through that kind of argument, and this, this is, you know, leading professors saying these things and, you know, bless his heart, Yann LeCun keeps saying, well, you know, there’s only gonna be a problem if we put in self preservation as an objective into the system and then it’s gonna be difficult. But we just don’t have to do that.
PAT: Because he’s saying we can unplug it. If it hasn’t, we can somehow hit the kill switch.
STUART: If it’s, if it doesn’t have self preservation, then it, it’ll always right. But unfortunately in, in, in the standard model, right, if you ask a robot to fetch the coffee, it’s going to have self preservation as a sub-goal because you can’t fetch the coffee if you’re dead. It does not take a genius to figure this out. And so the first thing the robot’s gonna do is disable its off switch so that nobody can prevent it from getting the coffee. Right. So you don’t put self preservation in as an objective. It follows, uh, as a consequence of building AI systems in the standard model.
And we’ve had this argument with Yann LeCun and over again, and he just keeps saying the same thing again. He just like resets and repeats his, his position and, and so it’s a little frustrating to have these discussions. Um, but I, I feel like actually things are coming around and, and when GPT 4 came out just over a year ago and then Microsoft produced that paper, they had worked with GPT 4 for several months at that point, so they wrote a paper, a very distinguished group of authors wrote this paper saying it exhibits sparks of AGI and then. Several of us got together and wrote what’s called the pause letter, right, which says, uh, an open letter asking that people not develop systems more powerful than GPT 4, so that governments would have time to figure out how to regulate things properly.
And that letter, I think, was what caused humanity to return to the office and read the email from the aliens. And what happened after that was actually quite similar to what would happen if we really did get an email from the aliens. There were, you know, emergency messages going out from UN headquarters to all the governments of the world.
There were emergency meetings in the White House. There was almost instantly, China wrote some extremely strict regulations on. Uh, large language models, uh, all pretty much put them out of, out of business.
PAT: Do you, do you think because that, that was interesting. A lot of people, the response to that letter was, well, that’s never gonna work. You know, no one’s gonna pause. But was the point, was that the point to get the response, to get people to pay attention?
STUART: The point, yeah. I, oddly enough, there were no systems more powerful in GPT 4 that were released in the following six months. So yeah, you’re right. A lot of people said, oh, you know how naive is that? Right. Um, but in fact the pause did happen. Yeah. Um, but the main goal was to, to get governments to listen. Yeah. And they were, I think, primed to listen because of Chat GPT, they had used it. Right. You know, five years ago, I could try to explain to a politician what, you know, what artificial intelligence is and, you know, and by the time I got to trying to explain, you know, why it might present a risk, you know, they’d fallen asleep or they were looking at their watch or, you know, asking whether I had a donation to make or whatever, you know? And, um, but now I don’t have to do anything. Yeah. They get it. They ask me, you know, how do we control this thing? What do we do?
PAT: Well, and the other thing that’s striking about it is that even as you and I are talking about Chat GPT, it’s like n not the same river twice. Right? That whole thing. It’s getting better all the time. Right. It’s not the same product every time you go back to it. Kinda like Google Translate has gotten so much better on these tools, you know, every time you use ’em you can notice, Wow, it didn’t use to get that, and now it does. So there is a really chilling, um, I think it’s called the hard takeoff scenario, where you get AI programs creating recursive programs. So making themselves better. Yep. And that, just that process running away from us. So this seems to me the nightmare scenario where if that happens, then it’s too late.
STUART: Yeah. I mean that, that’s one of the scenarios that people are very concerned about.
PAT: Was it clear what I said about that?
STUART: Yeah. So, so the idea is that at, at some point, you know, and it’s already happening, AI systems are actually pretty good at writing code and you can ask them questions about AI research and they can tell you something that sounds sensible. I’d say we’re not at the point where they could write code to make a better version of themselves, but, that’s something that people are testing constantly. This idea of red teaming is working in a sort of secure sandbox. Can I get the system to start improving its own operations? And if that happens, then, you know, version two would be more intelligent than version one. So by assumption it would be even better at doing AI research. And so version three would be way better than version two. And version three would be incredibly good at AI research and produce version four that was way better. And this could happen, uh, you know, in, in literally days, right?
PAT: That’s the other thing. The speed at which these things happen, I think is hard for us to appreciate. And I, and I think that for me, the visceral reaction to Chat GPT, the first time I used it, was the speed with which it spat out pages of perfectly comprehensible, perfectly punctuated text. Um, and as a writer, it just made me, you know, it was like John Henry just lost the battle to the steam engine, right? You know, I can’t do that. Um, so this is the other thing, and I’m sure people, Esther, whenever you have, um, some questions, I’ll take ’em, but I’m sure this is gonna be a question. What are we gonna do for work? What, what jobs are we gonna have left to us in the future if we’ve created superintelligent artificial intelligence?
STUART: Um, yeah. Uh, it’s interesting because I went to an economics workshop about, I think three weekends ago, and for most of the last a hundred years, economists have said, look, you know, we have theorems. There is no such thing as technological unemployment. And the, you know, the theorem is very straightforward because, you know, if you, if you make labor more productive, then labor has higher value, so there’ll be more demand for labor, right? And anyone who thinks otherwise, it’s just a Luddite. And now at this workshop, a lot of leading economists, they finally get, yes, there’ll be lots of demand for labor, but it won’t be human labor. Right. And, you know, and, and then, you know, it’s, it’s, it’s almost like a sort of a conversion moment, you know? Then they start building models of what happens and oh golly, human wages go to zero. And you know, and, and as you mentioned, writing, right? This is already, we’re seeing this already in, there are marketplaces where you could bid for writing tasks and prices are dropping precipitously.
PAT: Yeah. And I remember the first, again, the speed at which these things are changing because the first time I, I, somebody told me that AI was writing sports articles. Well, that’s, that makes sense that they would start there. It’s pretty simple. Scores and box square and it was really bad. Now it’s really good.
STUART: Yeah. I mean, they can watch the basketball game. Yeah. And then write an article about what happened. Describe it, and you know, and they, you know, they got the jargon, they got the, the writing style. You know, and if you want, you can have the basketball report in the style of Emily Dickinson, right?
PAT: That’s right. That’s right. She wasn’t a big basketball fan. Someone wants to know ’em. When we use these systems, are we involved in helping to train them?
STUART: That’s an interesting question and I, I believe that in the early days, OpenAI was collecting all the conversations that Chat GPT had with users and then using that data to retrain the system. But there are huge privacy issues with that because people use them in companies and, you know, they put in data or prompts that have company proprietary information. So apparently it became common knowledge among consultants that they could actually, you know, on behalf of their clients, they could find proprietary information about their clients’ editors by simply asking Chat GPT, which is not good.
And, you know, and so I know many companies who literally have banned the use of commercial large language models because they don’t trust that the conversations will remain proprietary. That they won’t be sent back to the system for subsequent retraining. I believe that OpenAI sincerely, you know, I, they say we’re not doing that. You know, I believe them. Unless you, so unless you opt in, I think your conversation now is, is private and, and the system forgets that it ever happened. Um, but this actually I think is, you know, it’s a property that we would like for all kinds of systems, for even just for search engines, browsers. Right. I want it to have a stamp on it that says, I will forget everything that just happened. Right. And this is an absolute mathematical guarantee and it’s something that computer scientists know how to prove about a given software system, that it will forget the interaction. And, and this should become a, standard that we can all trust, uh, and, and everyone understands what it means.
PAT: Is there a resistance to that from business?
STUART: I think the software industry for its entire history has gone completely unregulated, and they just make this argument that regulation stifles innovation. You know, and they make a little tape loop that plays in the ears of Congress people while they go to sleep every night.
It’s kind of interesting, you know, so you go to these meetings. And the industry people say, oh, you know, you can’t have regulation, it just would kill the industry. They’ve all flown there on highly regulated airplanes. They would not get into an unregulated airplane if you paid them a billion dollars. Right. They’re eating highly regulated food. My friend Emily runs a food standards agency in the UK, and there are far more rules on sandwiches than there are on software. Yeah. Right. And these tiny little sandwich shops, you know, have got to get like half a dozen permits. They’ve gotta train all their employees, they’ve gotta label their stuff. They gotta check the sourcing of their materials. They gotta watch out for the shelf life. They gotta have random inspections. And these trillion dollar companies that say, oh, you know, we can’t fill out a form. Right. You know, that would stifle our innovation if we had to fill out a form. It’s pathetic.
PAT: Well, that leads into this question someone asked, given that the government hasn’t been successful at regulating social media, how much faith do you have in the government regulating AI and how would you recommend they do this?
STUART: Yeah, I think that’s a great question. And it’s absolutely clear that legislators recognize that they totally blew it on social media. Everyone will say, we don’t wanna repeat that mistake. But they, you know, they ask, well, what should we do? And they’re listening. You know, the Senate has held multiple hearings where they’ve invited AI people to come and talk about, you know, how do we regulate and what’s gonna be effective?
And I think there’s a lot of stuff we could do that would be pretty easy. A simple thing would be that we have a right to know if we’re interacting with a human or a machine, period that’s very easy to put into law. And you could define disclosure standards. So just like, you know, we have disclosure standards for those really annoying direct mail things about credit cards, right? They have to have a big type, what is the percentage interest rate on the card? What is the grace period? Those are mandated by Congress, and they can do that. So they could just mandate disclosure into some standard format. You are interacting with a machine. I am a machine. You know what, maybe use a typeface, one of those sort of digital looking typefaces so it’s clear that this is a computer typing to you, not a human. Don’t call yourself, you know, Maria or Joe or whatever, you know, your, yeah. You’re chatbot 1362. Yeah. So that would be a good place to start, uh, just, just to sort of, you know. Wake up those long dormant muscles.
PAT: So that’s just getting the ball rolling.
STUART: I think the most important type of regulation is regulation that will cause the developers to do the research on safety that they haven’t done. The principle is completely straightforward and we have it for medicines, we have it for airplanes, we have it for nuclear power stations. Before you can access the market, you have to show that your product is safe. And what form that evidence takes varies. I mean, with medicines it’s clinical trials and those clinical trials give you a pretty good statistical guarantee. So with high probability, you could take this drug and it won’t kill you. With nuclear power stations, they have to provide a mathematical proof that the mean time to failure of their design is 10 million years or more. And that mathematical proof is very, very extensive. I mean, if you, if you had to put it in paper, it would be, you know, hundreds of thousands of pages. And so this is the kind of thing we should do.
So what would we require them to prove? It’s a little more difficult because AI is so general that the notion of safety for an airplane is, it doesn’t hit the ground when it’s not supposed to. Right? The right. The notion of, you know, a safe medicine, it doesn’t kill you or, or damage major organs that, you know. For nuclear power, there isn’t a core meltdown or a major release of radiation. So what’s I think we need is, is here are some behaviors that would be completely unacceptable if the AI system exhibits them, such as, you know, replicating itself without permission, advising terrorists on how to build biological weapons, breaking into other computer systems, right. Defaming real individuals. Anyone in the street would say, yeah, of course they shouldn’t do that. And then they ask the trillion dollar corporation, well, you know, can you guarantee that your system isn’t gonna do those things? And they say, Nope, nope. In fact, quite likely our systems are gonna do those things.
And that, that I think, you know, is an unacceptable state of affairs. So if, if the government says, okay, well, sorry, until you can come back with the answer yes, we can show that our systems are safe in those, in those senses, you can’t access the market, they would have to do the research to figure out how to understand and predict and control their own systems.
PAT: In the book you talk about provably beneficial AI, right? That’s the goal.
STUART: Yeah. Yeah. So, so the goal. You know, particularly because we’re talking about the fate of humanity here, right, a hand wavy argument, you know—so what they do now is this team, they go, you know, we hired some smart guys from Palo Alto Junior College and, you know, and then we gave them a week and they, they couldn’t get the system to, you know, to replicate itself or something.
PAT: So it’s good enough.
STUART: Yeah. Right. No, that’s, you know, and the numbers I’m hearing, I mean, the, the companies think, yeah, there’s maybe a five or ten percent chance that they’re gonna lose, you know, that at some point we will lose control and, uh, and face this extinction scenario.
So, I, you know, I said, okay, so you are gonna come into my house with a revolver that has, you know, ten barrels and one bullet, and you’re gonna put it to, you’re gonna line up all my children and everyone’s children right. In a long line. And you’re gonna put that revolver to the head of the first child and fire the gun. And if the bullet is there, it’s gonna pass through the heads of every child in the world. I don’t think so.
PAT: But there’s only a one in ten chance.
STUART: So there’s a, there’s a 90% chance that they become the richest people, uh, that the world has ever seen, and there’s a 10% chance that they kill everyone. I don’t think they’re entitled to do that.
PAT: So this question I think is a good one, um, because it follows on what we were talking about with regulation. So the way it’s phrased here is, what is a bigger threat, generalized AI or improper malicious training of the models? And I guess what I’m thinking is maybe they had something different in mind because they save, for example, garbage in, garbage out. But I’m also thinking about malicious actors. So regulations are great for actors who, you know, are following the law or feel compelled to follow the law. But, you know, how, what do we do about Vladimir Putin or others who won’t care?
STUART: Yeah. So there are failure modes where the AI system does things we don’t want because, you know, even though it’s pursuing a reasonable objective, it misunderstands the world. But usually, you know, a system that misunderstands the world is gonna be easier to defeat. Right? So imagine if you’re playing chess against a computer program and the computer program actually misunderstands some of the rules of chess. At some point it’s gonna make an illegal move and then you win automatically. So, misunderstandings by AI systems actually make them weaker, uh, even though they will still be defective. But the malicious use of AI, going back to the assistance games idea, even if we do have provably beneficial AI systems, and we can mandate that, you know, here is a template, you’ve gotta build your AI systems this way because this, this is the way we know that is actually safe. Then as you say, Putin or we call it the Doctor Evil problem, right? Doctor Evil doesn’t want to build provably beneficial AI systems. He wants to build AI systems that help him take over the world. So how do you stop that? And if you look at our success with preventing malware and cyber crime, right, it fills you with confidence. I read an interesting number, I don’t know where they get it from, but it seems a pretty consistent estimate if you look on the web: How much does cyber crime cost the world? It’s $7 trillion a year. How much revenue does the software industry generate? $700 billion a year.
PAT: 700 billion versus 7 trillion.
STUART: So, so for every dollar we spend on software, it costs us $10 to deal with the defects in the software that enable cyber crimes. I don’t know how accurate these numbers are, but um, that tells you how successful we are with malware and it’s really hard to prevent because it’s software is produced by typing, and it replicates infinitely and moves at the speed of light. And it can be obfuscated, meaning that it can be written in such a way that you can’t even tell what it is and so on. So I think if we’re going to prevent that type of malicious AI deployment, the, the place where there’s a bottleneck is in the hardware. Because there’s only a handful of manufacturers of high-end hardware in the world. And to become one of those. So if you wanted to bypass that and do it yourself, it’s gonna cost you about a hundred billion dollars to create the manufacturing capability.
And you need tens of thousands of highly trained engineers. Uh, and, and really, you know, so A SML in the Netherlands is the company that produces the machines that then TSMC uses to make the chips. And these are extreme ultraviolet etching machines. They’re the only company in the world who knows how to make those right. So it’s really, really difficult to become a rogue manufacturer of high-end chips. So what you do is you require that the chips themselves are the police, right? That the chips say, I am not going to run a software object that doesn’t come with an appropriate authority. Right, and this can be done in two ways. What we do now is we have licenses, right? And you are, you know, when you download software off the web, your laptop, you know, the browser is checking the authority of, of the license. You know, is it up to date? Is it from a valid issuer? And it won’t let you run stuff that’s not authorized. But you can actually do something much better than that, which is to require a proof of safety and the hardware can check that proof. This is a technology called proof carrying code that was developed by George Necula, who was one of my colleagues at Berkeley. And, and so the hardware can check those proofs. So you don’t need an authority, it doesn’t have to be that the government gives you a license to, to run software. It doesn’t matter. The software has to come with the proof, and if the software isn’t safe, the proof won’t work. And so the hardware won’t run it. And that, that approach is, uh, I think a feasible approach to doing it. But as you can imagine, getting all of that in place, right, sort of replacing the whole stack is gonna be a huge undertaking, but we have to do it.
PAT: Someone wants to know about, um, quantum computing. What’s the fragility of qubit coherence has overcome? What role will quantum computing play in the development of AI?
STUART: Yeah, that’s a great question. So, quantum computation basically uses weird, uh, properties of quantum wave functions to get more computing power out of a fixed amount of hardware. So you might have this idea that, you know, the amount of computing that can get done on, you know, a hundred computing objects is a hundred times as much as you can get outta one computing object. And if you went to a thousand, that would be bigger. With quantum computation, it’s nonlinear in the amount of hardware that you have.
Right. And so if we had even, uh, a hundred qubit computer, if those were reliable qubits, a quantum bit, that would probably be more powerful than any computer we have. So what impact would that have? It would mean almost certainly the energy costs of computation would be dramatically reduced. And there’s a lot in the media about, oh my god, you know, AI is guzzling all the electricity in the world and, and is, you know, doubling the amount of CO2, this is all nonsense. The actual amounts, it’s, it’s somewhere on the order of 0.1% of electricity consumption is AI. Computing in general is maybe one to two percent. So a lot of, a lot of the numbers in the media, I hope there are not too many journalists in the audience. Journalists maybe have a hard time with millions and billions and things like that.
PAT: A billion is just a little more than a million.
STUART: But it, it is, it is growing fast and it is, you know, it’s significant, but the main impact would be that computations that might now take a billion years might only take a few seconds. And so from the point of view of AI as an AI researcher, it sucks, right? Because instead of having to do some research to, to understand how it is that one can manage the immense complexity of, of life in the world, using a very small amount of computation, you say, oh, to hell with it, you know, we’ll just use a vast amount of computation and instead of being intelligent. And so you, you can sort of skip over a lot of AI research. Yeah. Uh, and just brute force the hell out of every problem. So it’s disappointing for me if that happens, but, you know, we could perhaps use that capability to solve some really, really hard problems that we want solved.
PAT: Yeah. If we get that far.
STUART: If we get that far. And it’s, it’s, it’s the physicists involved and the mathematicians involved are brilliant and creative and I think they’re gonna succeed.
PAT: Okay. Great. Well, I’m getting the signal that we need to wrap up, but I hope you enjoyed this discussion. Thank you for your wonderful questions. We will have a short reception in the atrium. I believe it’s called the atrium afterward. So thank you. I appreciate it. Thank you.
[MUSIC IN]
LEAH: This is The Edge, brought to you by California magazine and the Cal Alumni Association. I’m Leah Worthington. This episode was produced by Coby McDonald, with support from Pat Joseph and Nat Alcantara. Special thanks to Stuart Russell. Original music by Mogli Maureal.
[MUSIC OUT]