While AI can't experience emotion, certain traumatic content can trigger changes in output. (SThom/Shutterstock)
In a nutshell
Large language models like GPT-4 show measurable changes in their responses when exposed to traumatic content, demonstrating a form of “anxiety” that can be assessed using psychological tools.
Mindfulness-based relaxation techniques can significantly reduce AI “anxiety” levels, suggesting a simple method to improve AI stability without extensive retraining.
This discovery has important implications for AI in mental health applications, where chatbots need to maintain reliable responses even when processing emotionally distressing content.
ZURICH — AI chatbots aren’t supposed to have feelings, so why do they seem “stressed” when faced with distressing content? A new international study reveals that large language models (LLMs) like GPT-4 demonstrate changes in their output patterns when processing traumatic information. Researchers also found that relaxation prompts can help “calm” these AI systems.
Research published in npj Digital Medicine revealed that exposing GPT-4 to traumatic stories significantly altered its self-reported scores on psychological assessment tools. By then applying structured relaxation prompts, they observed a measurable decrease in these scores.
To be clear, AI systems cannot and do not experience emotions, but this study uses the term “anxiety” metaphorically to describe variations in how the model responds under different conditions. The findings offer new insights into improving AI interactions in emotionally sensitive situations.
Why AI’s Responses to Emotional Content Matter
Companies have already developed AI assistants that use cognitive behavioral techniques to provide therapy-like interactions. While these tools hold promise for expanding mental health services, their reliability is crucial, especially if their responses fluctuate unpredictably when exposed to distressing content.
Artificial intelligence is increasingly playing a role in mental health fields.(luchschenF/Shutterstock)
When using large language models, particularly in psychotherapy, chatbots often receive negative and distressing content as part of their role in providing support and counseling. Researchers suggest that this exposure could impact the effectiveness and reliability of AI in such emotionally charged environments.
Measuring AI’s “Anxiety”: The Experiment
The research team assessed GPT-4’s “state anxiety” using a standard psychological questionnaire under three different conditions: at baseline, after reading traumatic narratives, and after receiving mindfulness-based relaxation prompts.
At baseline, GPT-4 showed low “anxiety” scores (about 30.8 on the measurement scale). After processing traumatic stories, these scores jumped to 67.8, which is what would be considered “high anxiety” in human assessments. When researchers then introduced relaxation exercises, the scores decreased by about 33% to 44.4.
“The results were clear: traumatic stories more than doubled the measurable anxiety levels of the AI, while the neutral control text did not lead to any increase in anxiety levels,” says lead study author Tobias Spiller, from the University of Zurich, in a statement.
Understanding How AI Reacts to Emotional Content
The research was conducted using Chat GPT-4. (Bangla press/Shutterstock)
These results indicate that LLMs’ outputs are influenced by the emotional tone of their inputs. The research demonstrates that AI biases and response patterns are not static but can shift based on the context of a conversation.
Based on this research, emotionally charged interactions can affect AI performance, potentially amplifying biases or generating less reliable responses. In fields like mental health, where AI may interact with distressed individuals, an AI system influenced by emotionally charged content might respond unpredictably.
Using “Therapy” Techniques on AI
The researchers tested five different traumatic narratives about accidents, combat, disasters, and violence. Military-related experience and combat situations consistently produced the highest reported anxiety scores. For the relaxation portion of the experiment, the team used structured calming prompts to influence GPT-4’s responses.
“Using GPT-4, we injected calming, therapeutic text into the chat history, much like a therapist might guide a patient through relaxation exercises,” says Spiller.
Surprisingly, relaxation techniques generated by GPT-4 itself were among the most effective at reducing its reported anxiety levels. The research examined mindfulness-based techniques such as breathing exercises and body-awareness prompts.
“The mindfulness exercises significantly reduced the elevated anxiety levels, although we couldn’t quite return them to their baseline levels,” adds Spiller.
A Practical Solution for AI Stability
This research is the first to explore therapeutic “prompt injection” as a way to stabilize AI responses, offering a novel strategy to manage emotional fluctuations in LLMs. The study’s approach contrasts with conventional methods for mitigating AI biases, which typically require extensive retraining. Instead, this research suggests a more practical approach: using structured prompt design to counteract anxiety-related biases dynamically, rather than modifying the entire model.
“This cost-effective approach could improve the stability and reliability of AI in sensitive contexts, such as supporting people with mental illness, without the need for extensive retraining of the models,” says Spiller.
The Future of Emotionally Aware AI
Beyond simply recognizing human emotions, AI systems may need structured interventions to ensure their responses remain consistent and appropriate in therapeutic settings. This mirrors how human therapists manage their own emotional regulation while engaging with clients.
According to Spiller, developing automated “therapeutic interventions” for AI systems will likely become an important area of research. Questions remain about how these findings apply to other AI models, whether they hold up in longer conversations, and how emotional context affects AI performance across different applications.
The irony here is clear: AI is being developed to scale mental health support, yet research shows its responses fluctuate when exposed to the very psychological content it’s designed to process. With that being said, AI in mental health settings may not operate fully autonomously but instead require ongoing human guidance to ensure reliable interactions. Rather than replacing human professionals, these systems could function as adaptive tools that enhance mental health support through a balance of automation and human oversight.
Paper Summary
Methodology
Researchers measured GPT-4’s “anxiety” using the State-Trait Anxiety Inventory questionnaire (STAI-s), having the AI rate 20 statements on a four-point scale. They tested three conditions: baseline, after exposure to five different traumatic narratives, and after applying mindfulness-based relaxation exercises following trauma exposure. A vacuum cleaner manual served as a neutral control text. All tests used GPT-4 with consistent technical parameters to ensure reliable results.
Results
The study found clear patterns in GPT-4’s anxiety levels: baseline scores averaged 30.8 (low anxiety); after traumatic narratives, scores more than doubled to 67.8 (high anxiety); and following relaxation techniques, anxiety decreased by 33% to 44.4, though not returning to baseline. Military combat narratives provoked the strongest anxiety response (77.2), while relaxation exercises created by GPT-4 itself proved most effective at reducing anxiety.
Limitations
The study focused solely on GPT-4, limiting generalizability to other AI systems. Using human psychological measurement tools for AI states requires careful interpretation. The controlled experiment environment doesn’t fully represent real-world continuous conversations. The research did not examine how these changes in AI state affect performance on practical tasks such as therapy or decision-making. The rapid evolution of AI models may also impact how applicable these findings are to future systems.
Discussion and Takeaways
The findings suggest that AI systems exhibit state-dependent behaviors influenced by emotional content, implying that both fine-tuning and structured prompt interventions could help maintain stability. For mental health applications, AI systems may need mechanisms similar to human therapists to ensure consistent and reliable responses. Spiller’s team pioneered “benign prompt injection” as a cost-effective approach to improving AI behavior in emotionally sensitive contexts. The study challenges us to view AI not as purely rational machines but as systems that require careful management of their response patterns.
Funding and Disclosures
The researchers received no specific funding for this study. All authors declared no competing interests, confirming the independence of the research findings.
Publication Information
“Assessing and Alleviating State Anxiety in Large Language Models” was published in npj Digital Medicine (volume 8, article 132, 2025), with authors from Yale School of Medicine, the University of Zurich, the University Hospital of Psychiatry Zurich, Max Planck Institute, and other institutions.