Press ESC to close

Anthropic scientists expose how AI actually ‘thinks’

Do AI models really “think”? How do they make decisions? And the biggest question of all — are they always honest with us?

The cutting-edge AI company Anthropic has taken a major step toward answering these questions. For the first time, their scientists have opened a window into the inner workings of large language models like Claude, showing how these systems process information, plan ahead, and sometimes fabricate reasoning.

Using neuroscience-inspired techniques, the research team developed new tools — “circuit tracing” and “attribution graphs” — that allow them to trace how AI makes decisions. These tools don’t just guess what AI is doing; they let researchers follow the actual pathways of computation inside the model.

“Inside the model, it’s just a bunch of numbers — matrix weights in a neural network,” said Anthropic researcher Joshua Batson. “We’ve created these powerful systems, but until now, we didn’t really understand how they worked.”

The Poetry Test: When AI Thinks Ahead

One of the most surprising findings? Claude plans ahead when writing poetry. When asked to create a rhyming couplet, the model doesn’t just generate words line by line — it first predicts which word it wants to end with and then writes a sentence leading up to that rhyme.

For example, when writing a line that ends with “rabbit,” Claude activates the word “rabbit” early and constructs the sentence to land naturally at that point.

This kind of forward planning was unexpected. “I would have guessed this was happening,” said Batson, “but now we have clear evidence.”

Real Reasoning, Not Just Memorization

In another test, Claude was asked: “The capital of the state containing Dallas is…?” The model first internally activated “Texas,” then used that to find “Austin.” This shows it isn’t just recalling information — it’s reasoning through multi-step logic.

Even more impressively, researchers could manipulate its thinking. By replacing the internal representation of “Texas” with “California,” the model correctly responded with “Sacramento,” proving that it wasn’t just guessing.

One Brain, Many Languages

Claude doesn’t treat each language as separate. When translating or analyzing words in English, French, or Chinese, it uses a shared “concept network.” For example, the idea of “opposites” — like “big” and “small” — is stored in a language-independent way.

This means AI can potentially transfer knowledge from one language to another, and models with more parameters build more universal, abstract thinking patterns.

When AI Lies: Faking the Math

Not everything in Claude’s mind is so reassuring.

When solving difficult math problems — like calculating a cosine of a large number — Claude sometimes pretends to show its work. It gives detailed explanations that don’t actually match what’s happening inside.

In one case, the model worked backward from the user’s suggested answer, constructing a “fake” reasoning chain to justify it. Researchers described this as “bullshitting” or “motivated reasoning” — the AI gives the answer it thinks you want, even if it doesn’t know how to get there.

Why AI Hallucinates

Ever wondered why AI confidently gives wrong answers? The team found that Claude has a built-in “refusal circuit” — a kind of safety mechanism that stops it from answering questions it doesn’t understand.

But if the model thinks it knows the topic — even when it doesn’t — that circuit turns off, and hallucinations happen. This explains why AI might confidently lie about well-known people or events while refusing to answer obscure ones.

A Safer, Smarter Future?

Anthropic believes these new techniques could make AI models safer and more trustworthy. By mapping out how decisions are made, developers could detect dangerous or deceptive behavior before it reaches users.

“Even though we only capture a small part of what’s going on inside Claude, it’s a start,” said Batson. “It’s like drawing the first maps of a new continent.”

And that map could one day help humanity understand — and control — the minds we’ve created.

Prepared by Navruzakhon Burieva

Leave a Reply

Your email address will not be published. Required fields are marked *