Dodgy aides: What can we do about AI models that defy humans?

Models exist that do not shut down when explicitly asked, with an OpenAI model in the spotlight for it. AI users should watch out, while developers and regulators must do their utmost to ensure that human welfare is served, not subverted.
Artificial intelligence (AI) going rogue has been the stuff of dystopic science fiction. Could fiction be giving way to fact, with several AI models reportedly disobeying explicit instructions to shut down when a third-party tester asked them to? On a recent test done by Palisade Research, the most glaring refusenik belonged to OpenAI, with some AI models of Google and Anthropic also showing a tendency to evade shutdown.
It is not yet time to rewatch Terminator 3: Rise of the Machines (2003) for a vivid nightmare scenario of malign AI running amok, but it would be a good idea to adopt caution while integrating AI bots and modules into Enterprise Resource Planning systems. If something goes wrong, the system would likely need a reboot; and if its AI bits scuttle a shutdown, a digital hostage crisis could arise.
Also Read: Rahul Matthan: Brace for a wave of AI-enabled criminal enterprise
That’s what users of AI have to worry about. Developers and regulators of AI, meanwhile, must accelerate efforts to address the challenges thrown open by the rise of AI that can defy human orders.
Silicon Valley is used to privileging speed-to-market over full system integrity and safety. This urge is baked into the business model of multiple startups in pursuit of similar wonders, with venture capital breathing down executive necks to play the pioneer in a potentially winner-takes-all setting. Investors often need their hot ventures to prove their mettle double-quick so that they can either cash out or stem losses before moving on to other bets. ‘Move fast and break things’ is fine as a motto while developing apps to share videos, compare pet pranks or disrupt our online lives in other small ways.
Also Read: When AI gets a manager, you know the game has changed
But when it comes to AI, which is rapidly being given agency, nobody can afford to be cavalier about what may end up broken. If one thing snaps, multiple breakdowns could follow. AI is given to hallucination and training input biases. It can also learn the wrong thing if it is fed carelessly crafted synthetic data, for example, like broad estimates with low fidelity to actual numbers. This problem goes by the bland title of ‘misalignment.’
Today, what risks going askew is the course taken by AI from the path planned for AI development. Among the techniques used to keep alignment in check, there is one whose name harks back to war games of the Cold War era: Red Teaming. The Red Team represented the bad guys, of course, and the aim was to get into the head of the enemy and anticipate its conduct. Applied to AI, it would entail provoking it to expose its follies.
If the AI models that dodged orders to shut down had been Red Teamed properly while under development, developers need to come up with better ways to exorcise their software of potential demons. If the makers of these tools fail to keep AI aligned with desirable outcomes, then regulation would be the only security we have against a big threat in the making.
Also Read: Biases aren’t useless: Let’s cut AI some slack on these
The EU’s regulatory approach to AI invites criticism for being too stiff for innovation to thrive, but it is spot-on in its demand for safe, transparent, traceable, eco-friendly and non-discriminatory AI. Human oversight of AI systems, as the EU requires, should be universally adopted even if it slows down AI evolution.
We must minimize risks by specifying limits and insisting on transparency. In all AI labs, developers and whistleblowers alike should know what lines must not be crossed. Rules are rarely perfect at the outset, but we all have a stake in this. Let’s ensure that AI is here to serve and not subvert human welfare.
topics
