Anthropic’s CEO, Dario Amodei, has shared a clear message: we must better understand how artificial intelligence (AI) models work. Amodei sets a bold target in a new essay published on June 20, titled The Urgency of Interpretability. By 2027, Anthropic hopes to detect most problems within advanced AI systems reliably. While the task is complex, Amodei believes AI must be safe and responsible in society.
Why understanding AI is so important
When you interact with a powerful AI tool, such as a chatbot or summarising assistant, you might assume the developers know exactly how it works. But according to Amodei, that’s not the case. Even the companies creating the most advanced models don’t always understand why they make certain decisions or sometimes make mistakes.
For example, OpenAI recently released two new models called o3 and o4-mini. While they perform better on some tasks, they also tend to “hallucinate” more — in other words, produce false or confusing information. The problem? No one knows precisely why this happens.
Amodei warns that we could face serious risks if we build more powerful AI systems without improving our understanding. He compares the future of AI to “a country of geniuses in a data centre” — brilliant but mysterious and potentially unpredictable.
Chris Olah, Anthropic’s co-founder, adds that today’s AI systems are more grown than built. That means improvements often come from trial and error, not from clear plans or designs. As a result, researchers may create intelligent systems without fully grasping how they function.
What Anthropic is doing about it
Anthropic is a leader in mechanistic interpretability, which tries to open AI’s “black box.” The company wants to figure out exactly how AI systems make decisions and understand what drives their behaviour.
One promising area of research involves studying “circuits” within AI models. These are patterns that show how models process information. For instance, Anthropic has found a specific circuit that helps AI determine which US cities belong to which states. It’s just one example — researchers estimate millions of such circuits could be in a single model.
In the long run, Amodei says his team hopes to develop something like an “MRI scan” for AI systems. These deep checks would help spot problems such as lying, manipulation, or unexpected behaviour. He believes these scans will be essential for safely testing and launching future AI tools. While this could take 5 to 10 years, the company is already progressing early.
Recently, Anthropic also made its first outside investment in a startup working on AI interpretability, showing its commitment to this mission.
A call for shared responsibility
In his essay, Amodei doesn’t just speak to his team. He encourages others in the AI field — especially at OpenAI and Google DeepMind — to invest more in research that explains how AI works. He also suggests governments should get involved but in a careful way. For instance, light regulations can be set that require companies to share their safety practices.
He goes further, saying the US government should control the export of advanced computer chips to China. He worries that without such limits, we might end up in a global AI race where no one is paying enough attention to safety.
Unlike some major tech firms, Anthropic supported California’s AI safety bill, SB 1047, which would have set standards for reporting safety risks in advanced models. While the bill faced pushback, Anthropic offered helpful suggestions, showing its willingness to lead on responsibility.
In the end, Amodei’s message is simple but serious. As AI becomes central to business, defence, and everyday life, we must learn how these systems work. Without that knowledge, we’re building tools that could one day act in ways we don’t understand — a risk we can’t afford to take.