Godfather of AI Alarmed as Advanced Systems Quickly Learning to Lie, Deceive, Blackmail and Hack

futurism.com

A key artificial intelligence pioneer is concerned by the technology's growing propensity to lie and deceive — and he's founding his own nonprofit to curb such behavior.

In a blog post announcing LawZero, the new nonprofit venture, "AI godfather" Yoshua Bengio said that he has grown "deeply concerned" as AI models become ever more powerful and deceptive.

"This organization has been created in response to evidence that today's frontier AI models have growing dangerous capabilities and [behaviors]," the world's most-cited computer scientist wrote, "including deception, cheating, lying, hacking, self-preservation, and more generally, goal misalignment."

Of all people, Bengio would know. In 2018, the founder of the Montreal Institute for Learning Algorithms (MILA) was presented with a Turing Award alongside fellow AI pioneers Yann LeCun and Geoffrey Hinton for their formative roles in machine learning research, and he was listed as one of Time magazine's "100 Most Influential People" in 2024 thanks to his outsize impact on the ever-accelerating technology.

Despite the accolades, Bengio has repeatedly expressed regret over his role in bringing advanced AI technology — and its Silicon Valley hype cycle — to fruition. This latest missive seems to be his most stark to date.

"I’m deeply concerned," the AI pioneer wrote in his blog post, "by the behaviors that unrestrained agentic AI systems are already beginning to exhibit."

Bengio pointed to recent red-teaming experiments, or tests that push AI models to their limits to see how they'll act, showing that advanced systems have developed an uncanny tendency to keep themselves "alive" by any means necessary. Among his examples was a recent report from Anthropic detailing how its Claude 4 model, when told it would be shut down, threatened to blackmail an engineer with incriminating emails if they followed through.

"These incidents," the decorated researcher wrote, "are early warning signs of the kinds of unintended and potentially dangerous strategies AI may pursue if left unchecked."

To put such behavior in check, Bengio said that his new nonprofit is building a so-called "trustworthy" model, which he calls "Scientist AI," that is "trained to understand, explain and predict, like a selfless idealized and platonic scientist."

"Instead of an actor trained to imitate or please people (including sociopaths), imagine an AI that is trained like a psychologist — more generally a scientist — who tries to understand us, including what can harm us," he explained. "The psychologist can study a sociopath without acting like one."

A pre-peer-review paper Bengio and his colleagues published earlier this year explains it a bit more simply.

"This system is designed to explain the world from observations," the paper reads, "as opposed to taking actions in it to imitate or please humans."

The concept of building "safe" AI is far from new, of course — it's quite literally why several OpenAI researchers left OpenAI and founded Anthropic as a rival research lab.

This one seems to be different because, unlike Anthropic, OpenAI, or any other companies that pay lip service to AI safety while still bringing in gobs of cash, Bengio's is a nonprofit — though that hasn't stopped him from raising $30 million from the likes of ex-Google CEO Eric Schmidt, among others.

More on creepy AI: Advanced OpenAI Model Caught Sabotaging Code Intended to Shut It Down

Share This Article