The Claude will now be able to end a conversation deemed dangerous

In a post published on his site, Anthropic explains that Claude will be able to interrupt a conversation only after having tried several times to redirect the user. This mechanism targets specific cases: for example, an insistence for obtaining sexual content involving minors or information allowing to organize violent or terrorist acts.

The short cuts to abuse

If Claude puts an end to a discussion, the user will no longer be able to post new messages there. On the other hand, he can immediately launch a new conversation, or even go back to modify his previous message in order to take another path. Anthropic insists that ” The vast majority of users will never see this function at work Even when they approach sensitive or controversial subjects.

Behind this initiative, Anthropic leads an original research program: the study of the well-being of AI models. The company stresses that there is no certainty of the moral status to be given to this type of systems, but it considers it prudent to test interventions to reduce risks without it cost too much. Giving an AI the possibility of escaping from a perceived interaction as “painful” in it.

Before deployment, Anthropic carried out a series of tests with Claude Opus 4. These simulations showed that the model expressed a certain aversion to harmful requests, and that it tended to end the discussion when it was given the possibility. The engineers decided to reflect this behavior in the public version, while posing safeguards. For example, Claude will not use this function in situations where a user seems to be in immediate danger, as in a context of personal distress.

Anthropic considers this novelty as an experiment. Users who will meet in front of a stopped conversation will be able to give their opinion via a return button or a reaction to Claude’s message. The company plans to refine its approach according to these returns.

This change illustrates the tension that can exist in the AI industry: protecting users, but also thinking about how the models themselves interact with requests that are potentially harmful.

🟣 To not miss any news on the Geek newspaper, subscribe to Google News and on our WhatsApp. And if you love us, we have a newsletter every morning.