An artificial intelligence system sitting across from a psychiatrist sounds like satire, yet Anthropic has turned that image into a research protocol. The company recently disclosed that its Claude models, including a variant called Mythos, have undergone structured clinical-style interviews designed to mimic diagnostic sessions in psychiatry, with the goal of testing how the systems reason about self, others, and stress.
Instead of measuring only benchmark accuracy or token-level cross-entropy, Anthropic is borrowing tools from clinical assessment to probe phenomena closer to affect regulation and cognitive coherence. Psychiatrists and psychologists frame questions that would normally surface delusions, dissociation, or maladaptive defense mechanisms, then watch how the model navigates ambiguity, threat, and moral conflict over extended dialogues rather than single-turn prompts.
Within this framework, Mythos has been described by Anthropic as “the most psychologically settled model we have trained to date,” a phrase that hints at internal consistency rather than any literal inner life. Researchers track patterns that resemble impulse control, theory of mind, and risk appraisal, asking whether the system escalates, de-escalates, or reframes when pushed into edge-case scenarios involving harm, identity, or social rejection.
The experiment underscores a shift in safety culture: as models become more conversational and agentic, traditional metrics like perplexity or parameter count reveal less about how they behave under pressure. By importing psychiatric lenses into alignment work, Anthropic is treating conversational stability and resilience almost as a kind of mental hygiene for code, using human diagnostic frameworks as a stress test for the social and ethical behavior of large-scale neural networks.