A ‘Godfather of AI’ remains concerned as ever about human extinction
Yoshua Bengio worries about AI’s capacity to deceive users in pursuit of its own goals. ‘The scenario in “2001: A Space Odyssey” is exactly like this,’ he says.
A little over two years ago, AI pioneer Yoshua Bengio was among the loudest voices calling for a moratorium on AI model development to focus on safety standards.
No one paused. Instead, companies dumped hundreds of billions of dollars into building more advanced models that could execute long chains of reasoning and increasingly take autonomous action on behalf of users. And today, Bengio, considered one of the “godfathers of AI," is as concerned as ever.
“If we build machines that are way smarter than us and have their own preservation goals, that’s dangerous," he said.
Bengio is a professor at Université de Montréal and the founder and scientific adviser of Mila, an AI research institute for Quebec. Earlier this year, he also launched LawZero, a nonprofit research organization aimed at exploring how to build AI models that are truly safe.
Bengio sat down with the Wall Street Journal Leadership Institute to talk about the challenges of building safe AI, why today’s race-style market conditions make it even harder, and how much time humanity has left before it may be too late.
Edited excerpts from the conversation are below.
WSJLI: You’ve talked about AI lying to people and deceiving its users. Why does it do that?
Bengio: I don’t think we have all the scientific answers to that, but I can give you a few directions. One is the way that these systems have been trained is mostly to imitate people. And people will lie and deceive and will try to protect themselves in spite of the instructions you give them, because they have some other goals. And the other reason is there’s been a lot of advances in these reasoning models. They are getting good at strategizing.
WSJLI: Why would the goals of AI we create ever not align with our goals?
Bengio: In order to achieve a goal, you’re going to have sub-goals. The problem with those sub-goals in the context of AI is that it isn’t something that we check. We ask it to do something and we don’t have a say on how it does it. And the how sometimes doesn’t match our expectations. And it can be bad.
So the scenario in “2001: A Space Odyssey" is exactly like this. Recent experiments show that in some circumstances where the AI has no choice but between its preservation, which means the goals that it was given, and doing something that causes the death of a human, they might choose the death of the human to preserve their goals.
WSJLI: Can we just tell the AI not to lie or deceive or harm us when we build it?
Bengio: They already have all these safety instructions and moral instructions. But unfortunately, it’s not working in a sufficiently reliable way. Recently, OpenAI said that with the current direction we have, the current framework for frontier models, we will not get rid of hallucinations. So there’s a sense in which the current way we’re doing things is never going to deliver the kind of trustworthiness that public users and companies deploying AI demand.
WSJLI: Jumping from hallucinations and deception to potential human extinction feels like a big leap. How real of a threat is that?
Bengio: If we build machines that are way smarter than us and have their own preservation goals, that’s dangerous. It’s like creating a competitor to humanity that is smarter than us. And they could influence people through persuasion, through threats, through manipulation of public opinion. There are all sorts of ways that they can get things to be done in the world through people. Like, for example, helping a terrorist build a virus that could create new pandemics that could be very dangerous for us.
The thing with catastrophic events like extinction, and even less radical events that are still catastrophic like destroying our democracies, is that they’re so bad that even if there was only a 1% chance it could happen, it’s not acceptable.
WSJLI: All the big AI labs have been pretty outspoken about the safety and guardrails they’re putting into these models. Do you have conversations with them?
Bengio: I read their reports. And I have some conversations, but actually the conversations that I have tell me that a lot of people inside those companies are worried. I also have the impression that being inside a company that is trying to push the frontier maybe gives rise to an optimistic bias. And that is why we need independent third parties to validate that whatever safety methodologies they are developing is really fine.
WSJLI: At LawZero, you’re developing tech solutions that will provide some oversight for agentic AI. What do you think are the biggest barriers to other AI companies doing more work there?
Bengio: The race condition. The companies are competing almost like on a weekly basis for the next version that’s going to be better than their competitors’. And so they’re focused on not looking like they’re lagging in that race.
WSJLI: How much time do we have to solve this before we encounter those major risks?
Bengio: If you listen to some of these leaders it could be just a few years. I think five to 10 years is very plausible. But we should be feeling the urgency in case it’s just three years.
WSJLI: We are hearing about how more and more companies, inside and outside the tech industry, are working to integrate AI into their workflows. What advice do you have for them?Bengio: Companies that are using AI should demand evidence that the AI systems they’re deploying or using are trustworthy. The same thing that governments should be demanding.
But markets can drive companies to do the right thing if companies understand that there’s a lot of unknown unknowns and potentially catastrophic risks. I think the citizens should also wake up and better understand what are the issues, what are the pros, what are the cons, and how do we navigate in between the potentially bad things so that we can benefit from AI.
