Will this make AI safer and less scary?

Today, patients and doctors would not know how an AI algorithm has determined the results of an x-ray. Anthropic’s breakthrough will make such processes more transparent. But Anthropic’s identified features are only a small subset of the model’s learned concepts. Finding a full set of features using current techniques would require more computing power and more money than used to train the model in the first place. Besides, understanding the model’s representations doesn’t tell us how it uses them.