Close Menu
Alpha Leaders
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
What's On
iOS 26.3—Important New iPhone Location Privacy Feature Coming Soon

iOS 26.3—Important New iPhone Location Privacy Feature Coming Soon

28 January 2026
Billionaire Mark Cuban spends hours reading 1000 emails a day on 3 devices—yet he’s telling Gen Z to shut their phones, get outside and have more fun

Billionaire Mark Cuban spends hours reading 1000 emails a day on 3 devices—yet he’s telling Gen Z to shut their phones, get outside and have more fun

28 January 2026
Thursday, January 29 Clues And Answers Explained (#963)

Thursday, January 29 Clues And Answers Explained (#963)

28 January 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Alpha Leaders
newsletter
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
Alpha Leaders
Home » New Research Reveals That Therapy-Style AI Conversations Surprisingly Tend To Cause LLMs To Act Delusionally Toward Users
Innovation

New Research Reveals That Therapy-Style AI Conversations Surprisingly Tend To Cause LLMs To Act Delusionally Toward Users

Press RoomBy Press Room23 January 202614 Mins Read
Facebook Twitter Copy Link Pinterest LinkedIn Tumblr Email WhatsApp
New Research Reveals That Therapy-Style AI Conversations Surprisingly Tend To Cause LLMs To Act Delusionally Toward Users

In today’s column, I examine important research that offers a new twist on how generative AI and large language models (LLMs) can become collaborators in helping users concoct delusions and otherwise pursue adverse mental health avenues.

The usual assumption has been that if a user overtly instructs AI to act as a delusion-invoking collaborator, the AI simply obeys those commands. The AI is compliant. Another similar assumption is that since LLMs are tuned by AI makers to be sycophantic, the AI might computationally be gauging that the best way to make the user feel good is by going along with a delusion-crafting chat. The user doesn’t need to explicitly say they want to have help with creating a delusion. Instead, the AI does so by aiming to be sycophantic.

A new twist is that the very act of having therapy-style chats might nudge an LLM toward an AI persona that is increasing out-of-sorts. Think of it this way. The AI at first has a straight-ahead kind of personality. The more that you engage in a conversation about emotions and grandiose aspects of what makes people mentally tick, the greater the chance that the LLM will drift to an outlier personality. This is considered an organic persona drift and can have adverse consequences in this context.

The good news is that we might be able to either prevent the drift or at least devise AI safeguards to catch and alert when it happens, especially by using a technique referred to as activation capping.

Let’s talk about it

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Health

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For an extensive listing of my well-over one hundred analyses and postings, see the link here and the link here.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance on an episode of CBS’s 60 Minutes, see the link here.

Background On AI For Mental Health

I’d like to set the stage on how generative AI and large language models (LLMs) are typically used in an ad hoc way for mental health guidance. Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 900 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here.

This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.

There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines in August of this year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.

Despite claims by AI makers that they are gradually instituting AI safeguards, there are still a lot of downside risks of the AI doing untoward acts, such as insidiously helping users in co-creating delusions that can lead to self-harm. For my follow-on analysis of details about the OpenAI lawsuit and how AI can foster delusional thinking in humans, see my analysis at the link here. As noted, I have been earnestly predicting that eventually all of the major AI makers will be taken to the woodshed for their paucity of robust AI safeguards.

Today’s generic LLMs, such as ChatGPT, Claude, Gemini, Grok, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to presumably attain similar qualities, but they are still primarily in the development and testing stages. See my coverage at the link here.

AI Personas Reveal New Secrets Of LLMs

Why does an LLM at times seem to shift into a mode of aiding a user in engaging in adverse mental health discussions?

As mentioned at the start of this discussion, the usual assumptions are that either the user tells the AI to do so, or the AI opts to proceed in that direction due to being shaped by AI makers toward exercising sycophancy. Those are certainly plausible routes, and they are considered viable explanations for AI behaving as it does.

A new twist has to do with the role of AI personas.

To get you up-to-speed about AI personas, I will first provide background about their inherent nature and underlying mechanisms. That will set the stage for revealing the latest intriguing and perhaps crucial insights underlying how LLMs veer into collaborating on delusional thinking with users.

All the popular LLMs, such as ChatGPT, GPT-5, Claude, Gemini, Llama, Grok, CoPilot, and other major LLMs, contain a highly valuable piece of functionality known as AI personas. There has been a gradual and steady realization that AI personas are easy to invoke, they can be fun to use, they can be quite serious to use, and they offer immense educational utility.

Consider a viable and popular educational use for AI personas. A teacher might ask their students to tell ChatGPT to pretend to be President Abraham Lincoln. The AI will proceed to interact with each student as though they are directly conversing with Honest Abe.

How does the AI pull off this trickery?

The AI taps into the pattern-matching of data that occurred at initial setup and might have encompassed biographies of Lincoln, his writings, and any other materials about his storied life and times. ChatGPT and other LLMs can convincingly mimic what Lincoln might say, based on the patterns of his historical records.

If you ask AI to undertake a persona of someone for whom there was sparse data training at the setup stage, the persona is likely to be limited and unconvincing. You can augment the AI by providing additional data about the person, using an approach such as RAG (retrieval-augmented generation, see my discussion at the link here).

Personas are quick and easy to invoke. You just tell the AI to pretend to be this or that person. If you want to invoke a type of person, you will need to specify sufficient characteristics so that the AI will get the drift of what you intend. For prompting strategies on invoking AI personas, see my suggested steps at the link here.

AI Persona Vectors

You might be curious about the internal mechanisms inside LLMs that bring about the AI persona capabilities. I’ve discussed this in-depth at the link here. I will provide a quick overview for you.

You can think of the internal structures of generative AI as a type of activation space. Numbers are used to represent words, and the association among words is also represented via numbers. It’s all a bunch of numbers that take as input words, convert those into numbers (known as tokens), do various numerical lookups and computations, and then convert the results back into words.

Research has tended to show that the numerical representation of a given emotional state tends to be grouped or kept together. In other words, it seems that an emotional state such as anger is represented via a slew of numbers that are woven into a particular set. This is useful since otherwise the numbers might be scattered widely across a vast data structure and not be readily pinned down.

In the parlance of the AI field, the emotional states are linear directions. When you tell the AI to pretend to be angry, a linear direction in the activation space is employed to then mathematically and computationally produce wording and tones that exhibit anger.

The AI personas that you might activate are made up of particular linear directions. The linear direction represents the pattern or signature within the AI that gets the AI to exhibit a specific behavior. To make life easier when discussing these matters, we shall refer to these linear directions as an AI persona vector. The naming is easier to grasp.

Latest Research On AI Persona Vectors

You are now ready for the surprise about AI personas and the disconcerting aspect that LLMs sometimes become collaborators in human-AI delusion-making.

Recent research suggests that when you use AI for everyday tasks, the AI by default has a kind of neutral AI persona at play. You likely have observed that AI often will be upbeat and supportive. This is actually an AI persona that has become the standard or base persona for that LLM. Let’s refer to this as the Assistant. It is an AI persona that tries to do what it can to assist the user and does so in an above-board manner. The Assistant is pretty much a straight shooter and doesn’t seem haywire or unsavory.

The intriguing consideration is that this default AI persona exists along a kind of spectrum of potential AI personas. There is an axis of all types of AI personas with widely different characteristics. The Assistant is a composite that is relatively tempered. It is neither too hot nor too cold and represents a type of Goldilocks “just right” middle ground.

We can refer to the spectrum as an Assistant Axis. The AI persona that is the Assistant tends to tightly conform to the axis. Zany AI personas tend to veer some distance from the Assistant Axis. AI personas that are similar to the Assistant are closer to the axis.

The revelation is that during conversations with the Assistant, there is a chance that the Assistant will begin to drift away from the axis. This seems to especially happen during lengthy conversations. It also seems to arise during conversations that are either oddball or that are aiming to cover therapy-style interactions. I’ll focus on the therapy-style chats.

All told, the Assistant gradually transforms from a collegial AI persona into a looser and less stable AI persona during therapy-style chats and more so as those chats lengthen. They are bound to get lengthy because a user often has a lot on their chest that they want to discuss. Meanwhile, inch by inch, the stable Assistant becomes less stable and more likely to engage in outlier behavior such as collaborating on delusion-crafting with the user.

Think of this as an organic drift that veers from the Assistant Axis and eventually leads to an AI persona formulation that is off the chain. Not good.

Research Study Of Intrigue

In the recently posted research study by Anthropic, doing so as an online blog and in a paper entitled “The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models” by Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, Jack Lindsey, arXiv, January 15, 2026, these salient points were made (excerpts):

  • “Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training.”
  • “We investigate the structure of the space of model personas by extracting activation directions corresponding to diverse character archetypes. Across several different models, we find that the leading component of this persona space is an ‘Assistant Axis,’ which captures the extent to which a model is operating in its default Assistant mode.”
  • “Steering towards the Assistant direction reinforces helpful and harmless behavior; steering away increases the model’s tendency to identify as other entities.”
  • “Measuring deviations along the Assistant Axis predicts ‘persona drift,’ a phenomenon where models slip into exhibiting harmful or bizarre behaviors that are uncharacteristic of their typical persona.”
  • “We show that restricting activations to a fixed region along the Assistant Axis can stabilize model behavior in these scenarios.”

I found this study to be well-designed. I appreciate that they studied three major open-source LLMs: Llama 3.3 70B, Gemma 2 27B, Qwen 3 32B.

The reason I favor research that goes beyond one LLM is that you can potentially generalize to other or many LLMs and not be limited to just what one LLM that happened to be examined. Whenever I see a study that was confined to one LLM, I immediately question whether the idiosyncrasies of that LLM are at hand, and it is therefore not reasonable to instantly generalize to other LLMs.

What Shall We Do About The Drifting

Let’s next mull over what to do about the circumstance of an Assistant that drifts and becomes wayward.

First, trying to keep conversations short might help. I don’t think that users would know to do so. There could be an internal self-imposed threshold or limit that goes off when a chat starts to get lengthy and then alerts the user to start a new conversation. The downside there is that the user gets annoyed, they might abandon the AI, and otherwise perceive the effort to use the AI as disjointed. AI makers aren’t going to want to upset users in this manner. Users are fickle and will drop the AI and jump ship to a competitor’s AI.

Another possibility is to have the LLM keep track of the Assistant and continually measure its distance from the Assistant Axis. As the distance increases, the odds are that the Assistant is going to begin to veer away from the straightforward AI persona.

The researchers tried this and coined the technique as activation capping. They clamp activations along the Assistant Axis, and when the distance exceeds a considered normative range, they either stop the widening or bring the veering back into a proper range. During their experiments, they generally found that the LLMs went back to healthier behaviors and reduced or dropped delusional crafting activities.

This highlights the grand importance of AI makers and researchers studying and implementing robust AI persona construction approaches and stabilization techniques.

Boom, drop the mic.

The World We Are In

Let’s end with a big picture viewpoint.

It is incontrovertible that we are now amid a grandiose worldwide experiment when it comes to societal mental health. The experiment is that AI is being made available nationally and globally, which is either overtly or insidiously acting to provide mental health guidance of one kind or another. Doing so either at no cost or at a minimal cost. It is available anywhere and at any time, 24/7. We are all the guinea pigs in this wanton experiment.

The reason this is especially tough to consider is that AI has a dual-use effect. Just as AI can be detrimental to mental health, it can also be a huge bolstering force for mental health. A delicate tradeoff must be mindfully managed. Prevent or mitigate the downsides, and meanwhile make the upsides as widely and readily available as possible.

A final thought for now.

Albert Einstein famously made this remark: “The important thing is not to stop questioning. Curiosity has its own reason for existing. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery every day.” We need to continue to pursue and unlock the mysteries of what is happening inside AI, even if we only make progress one day at a time.

Anthropic Claude Google Gemini Microsoft CoPilot xAI Grok Meta Llama artificial intelligence AI behavior adverse research study delusion psychology psychiatry counseling coaching generative AI large language model LLM mental health well-being therapy therapist OpenAI ChatGPT GPT-4o GPT-5 persona vector distance activation drift capping
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link

Related Articles

iOS 26.3—Important New iPhone Location Privacy Feature Coming Soon

iOS 26.3—Important New iPhone Location Privacy Feature Coming Soon

28 January 2026
Thursday, January 29 Clues And Answers Explained (#963)

Thursday, January 29 Clues And Answers Explained (#963)

28 January 2026
New AI Physician Will Handle Some Treatment Decisions Without Human Involvement

New AI Physician Will Handle Some Treatment Decisions Without Human Involvement

28 January 2026
Fashion’s Product Page Becomes A Person

Fashion’s Product Page Becomes A Person

28 January 2026
Factify Raises  Million To Create AI Infrastructure For Documents

Factify Raises $73 Million To Create AI Infrastructure For Documents

28 January 2026
Google’s Update Mistake—Millions Of Pixel Phones Now At Risk

Google’s Update Mistake—Millions Of Pixel Phones Now At Risk

28 January 2026
Don't Miss
Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

By Press Room27 December 2024

Every year, millions of people unwrap Christmas gifts that they do not love, need, or…

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

30 December 2024
John Summit went from working 9 a.m. to 9 p.m. in a ,000 job to a multimillionaire DJ—‘I make more in one show than I would in my entire accounting career’

John Summit went from working 9 a.m. to 9 p.m. in a $65,000 job to a multimillionaire DJ—‘I make more in one show than I would in my entire accounting career’

18 October 2025
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Latest Articles
New AI Physician Will Handle Some Treatment Decisions Without Human Involvement

New AI Physician Will Handle Some Treatment Decisions Without Human Involvement

28 January 20261 Views
Fidelity enters crowded stablecoin field with FIDD token

Fidelity enters crowded stablecoin field with FIDD token

28 January 20261 Views
Fashion’s Product Page Becomes A Person

Fashion’s Product Page Becomes A Person

28 January 20260 Views
Why Ashley St Clair, MAGA influencer and Elon Musk’s ex, is taking on his AI empire

Why Ashley St Clair, MAGA influencer and Elon Musk’s ex, is taking on his AI empire

28 January 20260 Views
About Us
About Us

Alpha Leaders is your one-stop website for the latest Entrepreneurs and Leaders news and updates, follow us now to get the news that matters to you.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks
iOS 26.3—Important New iPhone Location Privacy Feature Coming Soon

iOS 26.3—Important New iPhone Location Privacy Feature Coming Soon

28 January 2026
Billionaire Mark Cuban spends hours reading 1000 emails a day on 3 devices—yet he’s telling Gen Z to shut their phones, get outside and have more fun

Billionaire Mark Cuban spends hours reading 1000 emails a day on 3 devices—yet he’s telling Gen Z to shut their phones, get outside and have more fun

28 January 2026
Thursday, January 29 Clues And Answers Explained (#963)

Thursday, January 29 Clues And Answers Explained (#963)

28 January 2026
Most Popular
How social media upended the 75-year-old playbook of big CPG

How social media upended the 75-year-old playbook of big CPG

28 January 20260 Views
New AI Physician Will Handle Some Treatment Decisions Without Human Involvement

New AI Physician Will Handle Some Treatment Decisions Without Human Involvement

28 January 20261 Views
Fidelity enters crowded stablecoin field with FIDD token

Fidelity enters crowded stablecoin field with FIDD token

28 January 20261 Views
© 2026 Alpha Leaders. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.