5 min read.Updated: 27 Jul 2019, 09:20 AM ISTTom Chivers
In his new book, technology writer Tom Chivers tells us, and illustrates, why we should never ask artificial intelligence to make paper clips
‘The AI Does Not Hate You’ looks at AI and its future with the human race over the next few years
The nightmare scenario (with AI) is that we are all destroyed and turned into paperclips. The classic example of an AI that has gone terribly wrong—a ‘misaligned’ or ‘unfriendly’ AI—is a thought experiment that (AI researcher and founding director of the Future of Humanity Institute at Oxford University) Nick Bostrom wrote about in 2003... the paperclip maximiser.
Imagine a human-level AI has been given an apparently harmless instruction: to make paperclips. What might it do? Well, it might start out by simply making paperclips. It could build a small pressing machine and churn out a few dozen paperclips a minute. But it’s bright enough to know that it could be more efficient than that... It could instead use its materials to build a larger factory, so that it’s making thousands of paperclips a minute. If it really wants to make as many paperclips as possible, it might want to improve its ability to think about how to do so, so it might spend some of its resources building new processors, improving its own code, upgrading its RAM and so on.
You can see where this is going. The endpoint of the paperclip maximiser is a solar system in which every single atom has been turned into either paperclips, paperclip-manufacturing machines, computers that think about how best to manufacture paperclips, or self-replicating space probes that are hurtling out towards Proxima Centauri at a fraction of the speed of light with instructions to set up a franchise there. This isn’t what you meant, back when you said, ‘Go and make paperclips’ to your apparently docile AI, but it’s what you said.
This has, to some extent, entered the public consciousness, mainly through the medium of an extraordinarily viral online clicker game that was played by tens of millions of people in 2017, Universal Paperclips. In it, you are an AI whose job is to make paperclips. You start out by repeatedly clicking the ‘make paper- clip’ button, but the process becomes more automated and efficient and eventually (spoiler alert) your drones are exploring the observable universe for matter to turn into yet another septillion clips.
It’s actually a really good insight into the concepts behind AI alignment, because as the player you are incentivised solely to care about your ‘number of paperclips’ score. There are other things to care about—how much the humans (while they still exist) trust you and are willing to invest resources in you; your processing power; your manufacturing capabilities; your ability to defend yourself against anything that might stop you making paperclips, etc.—but they’re all secondary goals.
Nate Soares (the executive director of the Machine Intelligence Research Institute in Berkeley, California) says… that the risk isn’t that the AI will refuse to obey its instructions and decide it hates us; the risk is that it will obey its instructions perfectly, but in ways that we don’t like. (Soares quotes the example of) The Sorcerer’s Apprentice, both the Mickey Mouse version and Goethe’s poem—itself based on a 2,000-year-old Greek story, Philopseudes. The apprentice is told to fill a cauldron with water, using buckets from a well. But the apprentice—let’s use the Disney version and call him Mickey—finds the chore boring. So when the sorcerer leaves his workshop, Mickey borrows his magic hat and enchants a broom, ordering it to fill the cauldron for him. The broom grows little arms, grabs a bucket with each one and waddles off on its bristles to the well, as Mickey goes to sleep on the chair, happy to have outsourced his work. He is then awoken, an unspecified time later, when he is tipped unceremoniously into the flood of water that the broom has been bringing in ceaselessly while he slept.
What’s gone wrong? Well, imagine Mickey is a computer programmer and the broom is the system that he’s using… the broom’s mission is to make sure the cauldron is full, to achieve its objective. Then he writes a program which will make the broom take those actions which it calculates will be most likely to ‘maximise its expected utility’... The broom will see that the cauldron is empty and start filling it up, but once it is sure that the cauldron is full, it will stop. But the devil, as Soares points out, is in the detail... there is nothing in its system telling it to stop when it’s ‘sure enough’. Its sensors might be malfunctioning, or there might be a leak in the cauldron. It may as well just keep adding water, to add extra tiny bits of certainty.
Humans have a much more complicated reward system. A human filling that cauldron might assign 0 to empty and 1 to full, as Mickey did for the broom. But, as Soares says, she also might assign -40 to ‘the room gets flooded’, or -1,000,000 to ‘someone gets killed’, and a million other little things that are coded in our brains but never actually consciously brought to mind. There’s nothing in the broom’s system that says: ‘The positive utility I am likely to get from adding another bucket of water to the cauldron will be outweighed by the negative utility from the damage it is likely to cause.’ So it just keeps adding water, and Mickey is left bobbing around in the workshop.
You might think there are obvious solutions to each of these problems, and you can just add little patches—assign a -40 to ‘room gets flooded’, say, or a 1 value to ‘if you are 95 per cent sure the cauldron is full’ rather than ‘if the cauldron is full’. And maybe they’d help. But the question is: Did you think of them in advance? And if not, what else have you missed? Patching it afterwards might be a bit late, if you’re worried about water damage to your decor and electricals… Mickey thought that he was setting the broom a task, a simple, one-off, clearly limited job, but, in subtle ways that he didn’t foresee, he ended up leaving it with an open-ended goal.
Excerpted with permission from Hachette India. Tom Chivers is an author and technology writer based in the UK. The excerpt has been lightly edited for clarity and context.