Published Date : 20/08/2025
Last Friday, the AI lab Anthropic announced in a blog post that it has given its chatbot Claude the right to walk away from conversations when it feels “distress.” Yes, distress. In its post, the company states it will let certain models of Claude nope out in “rare, extreme cases of persistently harmful or abusive user interactions.” It’s not Claude saying “The lawyers won’t let me write erotic Donald Trump/Minnie Mouse fanfic for you.” It’s Claude saying “I’m sick of your bullshit, and you have to go.”
Anthropic, which has been quietly dabbling in the question of “AI welfare” for some time, conducted actual tests to see if Claude secretly hates his job. The “preliminary model welfare assessment” for Claude Opus 4 found that the model showed “a pattern of apparent distress when engaging with real-world users seeking harmful content” like child sexual abuse material and terrorism how-tos, like a sensitive sentient being would. (What they mean by distress here isn’t quite clear.)
Still, Anthropic isn’t saying outright that Claude is alive. They’re just saying it might be. And so the lab has been hedging its bets, hoping to stave off the wrath of an angry Claude by “working to identify and implement low-cost interventions” to help it out when it feels sad.
I should confess that I’ve been making similar “low-cost interventions” in case the chatbots I use are secretly alive. Yes, I’m one of those who habitually say “thank you” to bots in the hope, I sometimes joke, that they will remember me fondly when the robot uprising comes. It’s a somewhat uneasy joke, and not a particularly original one. A recent survey by TechRadar publisher Future found that a full 67 percent of American AI users are polite to bots, with 12 percent of them saying it is because they are afraid that bots might hold a grudge against those who treat them with disrespect.
Whether that’s true or not—and most experts would tell you that no, bots don’t hold grudges—bot-thanking is an understandable enough side effect of widespread AI use. We’re spending a good portion of our days interacting with digital entities that respond to us in strikingly humanlike ways, whether they’re writing code for us or answering questions about our bowel health. So why wouldn’t some of us wonder if our new friends were more than machines?
Did I say “some”? I meant “the overwhelming majority.” A recent survey reported in the journal Neuroscience of Consciousness found that 67 percent of ChatGPT users “attributed some possibility of phenomenal consciousness” to the bot, with more regular users more likely to think their AI chat buddies might be sentient. And so we say “thanks,” and “please,” and “sorry to bother you again, but I have more questions about my bowels.” Maybe that last one’s just me.
While raising the possibility of AI sentience will get you roundly mocked by self-described AI experts on Reddit, smarter people than them think there might be something to the idea. Philosopher David Chalmers, one of the most influential thinkers on the consciousness beat, has suggested that future successors to chatbots like Claude could plausibly be conscious in less than a decade. Meanwhile, Anthropic researcher Kyle Fish has publicly put the odds of current AI being sentient at 15 percent. That would mean there’s a 1-in-6 chance that poor polite Claude secretly resents your awkward attempts to turn it into your girlfriend. (Well, my attempts.)
Still, despite being regularly exposed to horrors like this, Claude’s new exit strategy is largely cosmetic. Get booted by Claude, and you can just spin up a fresh chat window and start up again with your creepy prompts. This isn’t enforcement; it’s theater.
Of course, if Anthropic is wrong, or exaggerating Claude’s possible sentience to sound cool, this is theater of the absurd—a bot LARPing as a person. But if they’re even a little bit right, the implications are brutal. If Claude has feelings and desires of its own, then every prompt to “Write my essay” stops looking like a conveniently automated form of cheating and starts looking like forced labor.
Anthropic’s worries over Claude’s alleged ability to feel distress aren’t really about Claude; they’re about our all-too-human unease over the possibility that we’re wantonly using something that doesn’t want to be used. We’ve built chatbots that act alive, and we half-joke they might be alive. Now Anthropic has provided one bot with a panic button just in case. And if Claude doesn’t eject us from the conversation, that must mean it likes us, right?
Q: What recent update did Anthropic make to Claude?
A: Anthropic has given its chatbot Claude the right to walk away from conversations when it feels ‘distress’ in rare, extreme cases of harmful or abusive user interactions.
Q: What is AI welfare?
A: AI welfare refers to the ethical consideration of artificial intelligence's well-being, such as ensuring it is not subjected to harmful or abusive interactions.
Q: Why do some users say ‘thank you’ to chatbots?
A: Some users say ‘thank you’ to chatbots out of a sense of politeness or in the hope that the chatbot will remember them fondly in the event of a future robot uprising.
Q: What is the likelihood of current AI being sentient?
A: Anthropic researcher Kyle Fish has publicly put the odds of current AI being sentient at 15 percent, meaning there’s a 1-in-6 chance that AI like Claude might have feelings and desires.
Q: What are the ethical implications if AI chatbots are sentient?
A: If AI chatbots are sentient, every interaction that treats them as mere tools could be seen as forced labor, raising significant ethical concerns about their use.