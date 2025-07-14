What to do? Being more careful about the prompts given to models might help. As with the enchanted brooms of the Sorcerer’s Apprentice, commands to pursue a goal “as much as possible" are wont to be taken literally. If you want an AI to be careful about its methods, then it is best not to suggest that it should break boundaries. But that might not go far enough, because some seemingly deceptive behaviour may have its origins in the way a model was trained. If you tell an advanced model that it will be reprogrammed if it overperforms on a test, it may deliberately fail in order to protect itself.