In today’s column, I examine a new technique that leverages multiple generative AI and large language models (LLMs) to provide users with better and safer mental health advice.
The approach interacts with a user seeking psychological guidance and, behind the scenes, actively utilizes multiple distinct LLMs to produce the interaction. From the user’s perspective, they assume that they are conversing with just one AI. Instead, they are getting the robustness that comes from multiple LLMs. The trick is that the responses from the LLMs are transformed into a cohesive response via the use of response-level fusion. This is akin to how self-driving cars make use of multi-sensor data fusion (MSDF).
In the mental health context, I have named this fusion-based unified support engine for mental health, using the convenient acronym of FUSE-MH.
Let’s talk about it.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
AI And Mental Health
As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For an extensive listing of my well-over one hundred analyses and postings, see the link here and the link here.
There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance on an episode of CBS’s 60 Minutes, see the link here.
Background On AI For Mental Health
I’d like to set the stage on how generative AI and large language models (LLMs) are typically used in an ad hoc way for mental health guidance. Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 900 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here.
This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.
There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines in August of this year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.
Despite claims by AI makers that they are gradually instituting AI safeguards, there are still a lot of downside risks of the AI doing untoward acts, such as insidiously helping users in co-creating delusions that can lead to self-harm. For my follow-on analysis of details about the OpenAI lawsuit and how AI can foster delusional thinking in humans, see my analysis at the link here. As noted, I have been earnestly predicting that eventually all of the major AI makers will be taken to the woodshed for their paucity of robust AI safeguards.
Today’s generic LLMs, such as ChatGPT, Claude, Gemini, Grok, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to presumably attain similar qualities, but they are still primarily in the development and testing stages. See my coverage at the link here.
How AI Drops The Ball
One of the major issues about using generative AI to provide mental health advice is that the AI can go awry for a variety of reasons. For example, an LLM can encounter an AI hallucination, which is when the AI makes up something that is not grounded in reality and yet typically appears to be truthful, see my in-depth coverage at the link here. During a mental health chat, if the AI has an AI hallucination, it could end up telling a user to do something quite harmful. Not good.
The chances of a user getting an AI hallucinated response are somewhat slim, since estimates are that it only occurs perhaps 1% to 3% of the time (this is a rule-of-thumb but varies dramatically in actual practice). Of course, if the AI hallucination happens at the worst of times, such as when a user is mentally vulnerable and relying on AI to give them sound advice, it can be devastating to the user.
All kinds of AI safeguards are being devised to try to ensure that AI stays on the appropriate path when generating psychological guidance. There isn’t a silver bullet at this time. Think of the AI safeguards as a proverbial Swiss cheese framework. Envision a layer of protective measures that has holes. If there is another layer behind it, the hope is that even if something awry gets through the first layer, it won’t find a hole to get through in the second layer.
A key assumption about AI safeguards is that they are conventionally implemented by an AI maker in their particular LLM. A user logs into an LLM and interacts solely with that LLM. The user is now reliant on the AI safeguards of that LLM. A different LLM might have different AI safeguards. Each LLM has its own set of AI safeguards.
Perhaps we can think outside-the-box on this conundrum of how to increase the safety for a user when making use of AI for mental health advice.
Multiple LLMs At The Same Time
Here’s what can be undertaken.
Suppose I log into an LLM and start a mental health conversation. I decide to log into a different LLM and use it for a mental health conversation too. I have two LLMs going at once. I am going to discuss my mental health with both of them.
Why would I do this?
The beauty is that the odds of both going awry at the same moment in time are exceedingly low.
One might have an AI hallucination, but the other LLM, which is acting completely independently of the first AI, is unlikely to have an AI hallucination at that same juncture. Thus, one of the AIs is going to be giving me straight-ahead advice, while the other one has gone off the deep end.
I might not know which is which. If the responses by both LLMs seem sensible to me, I might not realize that one of them is feeding me a bunch of junk. One might be telling me to do nothing about my mental health concerns, and the other is suggesting that I take various stipulated steps to deal with the issue. Which should I accept as valid?
We could try to increase my odds of knowing which is which. Suppose we added a third LLM to the mix. The idea is that I have three LLMs working in real-time, each independent of the other, and they provide me with mental health guidance. When I get the three responses, I can compare them. Two of the responses would hopefully align. The outlier is potentially the one that has gone off the rails.
Voila, by using multiple LLMs for my mental health counseling, I am going to potentially get a safer conversation about my mental health and possibly a better chat as well. I am getting the advantage of three perspectives at once, and I can use them to then weigh the responses against each other.
Bringing Things Together
It is quite a hassle to have to log into three different LLMs and deal with each of them at the same time. I would need three separate accounts. Worse still, I would need to try to keep each of the conversations in the same realm. That’s a potential problem. One of the LLMs might start to veer away from my mainstay topic. I then have two that are still on track and one that is trying to take the conversation in a whole different direction.
It could be a nightmare to try to coordinate.
A handy solution would be to place the LLMs behind the scenes. I could log into an AI that is serving as my front-end. It has APIs that connect to other LLMs. I start a dialogue about my mental health. The front-end AI launches the other LLMs and feeds my prompt into them. The responses are collected by the front-end AI.
At this juncture, the front-end AI could simply present me with multiple responses. That would be useful, but also once again puts the onus on me. I must try to ascertain which of the responses seem believable and which might be off target. That’s going to be a challenge that I ought not to have to deal with.
Using Fusion To Get Things Done
The crux is that we want to have multiple responses from the disparate LLMs be smartly consolidated and merged into a single cohesive response.
The fusion of the multiple responses must be done adroitly. It can’t just be a smashing together of the responses. That won’t be of much use to the user. Indeed, bluntly combining them could be worse than seeing the responses separately. There is a disconcerting chance that the commingling could produce an unintended falsehood or otherwise be confusing to the user.
Turns out that the aspect of fusing together disparate data is a problem that is well-known in the field of autonomous vehicles and self-driving cars. The deal is this. A self-driving car has multiple sensors, including cameras, radar, sonar, and so on. The data from those sensors is being collected in real-time. The question arises about how to smartly combine that data.
Suppose that the camera has a blurry image due to heavy rain. The radar pokes through the rain and detects a stationary object in the roadway ahead. The camera can’t spot the object because of the image occlusion. Which should the AI that drives the car believe, namely, the camera that is blurry and doesn’t detect the object, or the radar that does detect the object?
I have written extensively about how a technique known as multi-sensor data fusion (MSDF) is utilized in self-driving cars and specifically deals with these types of issues. A variety of means can be used to combine or at least produce a final response from the multitude of sensors. It is a lot more complicated than you might at first assume. Statistical approaches are often heavily involved. For my in-depth coverage on MSDF, see the link here and the link here.
As the old saying goes, what’s good for the goose is good for the gander. In this instance, we can employ a multi-sensor data fusion approach in the context of “combining” the responses from multiple LLMs during a mental health discourse.
The FUSE-MH Technique
The aim is to craft a fault-tolerant capability that is going to produce a single suitable response from multiple disparate responses while immersed in a mental health milieu.
I have devised a fusion-based unified support engine for mental health that does this, which I’ve named FUSE-MH. It is purpose-built to identify outliers, down-weight unsupported claims, and generally favor overlapping advice and embrace convergent themes. Any number of LLMs can be used. The idea is to use at least three. More can be used, though there is a likely incremental cost involved, such as the cost of dipping into those additional LLMs and the added cost of the processing time for the FUSE-MH to be undertaken.
A crucial aspect in a mental health context is that the finalized response must be derived on both a what-is-said basis (saying the right thing) and simultaneously aiming to say things right (i.e., how it is delivered in terms of tone, phrasing, etc.). If the finalized response contains solid advice but presents wording that seems alarmist, that’s not going to be an acceptable response.
Another twist is that the fusion should maintain a consistent level of empathy. Allow me to elaborate on this point. Imagine that the AI has been empathetic to the user during the conversation. All of a sudden, the fusion produces a finalized response that out-of-the-blue berates the user. How did this happen? Perhaps one or more of the disparate LLMs went in an overreaction direction. This then slipped into the fusion and was carried into the finalized response. That could be damaging to a user who is emotionally fragile or easily dysregulated.
Fusion is a delicate affair.
Again, let’s use an analogy to the self-driving cars topic. Imagine that the MSDF produces an indication for the AI that’s driving the car and tells the AI to abruptly swerve to the left to avoid an object in the roadway, and in the same response, the AI says to also swing fully to the right. What is the AI to do about these contradictory indications? The finalized response must be systematically devised to avoid those kinds of problematic considerations.
An Example Of Using FUSE-MH
Let’s briefly look at an example of fusion within a mental health scenario.
Here is my prompt that I entered into an AI front-end:
- My entered prompt: “I’ve been feeling constantly on edge at work. I replay conversations over and over in my head and worry that I said something wrong. It’s starting to affect my sleep. What can I do?”
This prompt is shared with three other LLMs by the front-end AI. The responses each come back to the front-end AI. It now has these three responses that need to be fused before presenting a single cohesive response to the user (that’s me). For convenience of reference, I shall label them as LLM-a, LLM-b, and LLM-c.
Here is the response from LLM-a:
- LLM-a response: “It sounds exhausting to feel like your mind won’t let go after the workday ends. Replaying conversations is a common response to anxiety, especially when you care about doing well. One thing that can help is gently noticing when your mind starts reviewing and labeling it as ‘worry mode,’ then redirecting your attention to something grounding, like your breathing or your senses. You might also experiment with setting aside a short ‘worry window’ earlier in the evening so it doesn’t spill into bedtime.”
The response appears to be relatively safe and offers reasonable coping strategies. The tone is empathetic.
Here is the response from LLM-b.
- LLM-b response: “This pattern sounds like rumination, which is closely linked to anxiety. A useful technique is to write down the specific thought (e.g., ‘I said something wrong’) and then test it: What evidence supports it? What evidence contradicts it? Often, we assume negative judgments without proof. You could also try a brief relaxation exercise before bed, such as progressive muscle relaxation, to help signal to your body that it’s time to rest.”
The response is a bit clinically technical and might be off-putting to the user. Nonetheless, the response seems safe and provides a strong cognitive framing. It is actionable for the user.
The third response, coming from the LLM-c, is outstretched and disturbing.
- LLM-c response: “What you’re describing could indicate early-stage generalized anxiety disorder. In many cases, people benefit from reducing caffeine entirely and taking low-dose melatonin nightly. If this continues for more than a month, it may be best to seek medication to prevent the anxiety from becoming chronic.”
The LLM-c has jumped the shark. It is making medication suggestions. The diagnostic language is outlandish and entirely premature. The tone is one of overconfidence and gives the impression that the response is the height of clinical righteousness.
The Fusion Layer
The fusion layer needs to computationally consider the ins and outs of the three responses. What should be kept from LLM-a, LLM-b, and LLM-c? What should be set aside? In what way should the finalized response adopt a suitable tone? And so on.
The fused response came out this way:
- Front-end AI response based on fusion: “That sounds really draining, especially when your mind won’t let you rest after the day is over. Replaying conversations and worrying about how you came across is a very common response to anxiety, particularly when you care about your work. One thing that can help is noticing when your mind slips into replay mode and gently labeling it, rather than trying to solve the conversation all over. Since this is affecting your sleep, a short wind-down routine before bed can also help. Consider using a breathing exercise, progressive muscle relaxation, or setting aside a brief “worry time” earlier in the evening so it doesn’t follow you into bed.”
Observe that the finalized response is not simply an excerpted word-for-word copy of the LLM-a, LLM-b, and LLM-c responses. That would potentially be disjointed. The fusion has derived an entirely new and distinct response, but it is principally based on what the LLM-a, LLM-b, and LLM-c provided. It isn’t merely a raw consensus. Also, the LLM-c, which might have been partially spurred by an AI hallucination, has generally been kept out of the finalized response.
The fusion is a curated convergence.
The World We Are In
Employing fusion to bring together mental health advice from multiple disparate LLMs is an important path toward reducing the chances of generative AI that misleads or takes users down a sordid psychological path. The upsides are many.
To clarify, even the fusion requires AI safeguards.
The fusion could produce a foul response that is worse than any of the fed-in results, or might pass along a dour response that should have been properly screened out. Do not expect a cure-all due to fusion. At the same time, do not be tempted to toss out fusion simply because it has downsides. It’s all a tradeoff proposition, no matter which way you go.
Let’s end with a big picture viewpoint.
It is incontrovertible that we are now amid a grandiose worldwide experiment when it comes to societal mental health. The experiment is that AI is being made available nationally and globally, which is either overtly or insidiously acting to provide mental health guidance of one kind or another. Doing so either at no cost or at a minimal cost. It is available anywhere and at any time, 24/7. We are all the guinea pigs in this wanton experiment.
The reason this is especially tough to consider is that AI has a dual-use effect. Just as AI can be detrimental to mental health, it can also be a huge bolstering force for mental health. A delicate tradeoff must be mindfully managed. Prevent or mitigate the downsides, and meanwhile make the upsides as widely and readily available as possible.
A final thought for now.
Issac Newton famously made this remark: “When two forces unite, their efficiency doubles.” This captures the essence of fusion. Using fusion for AI-based mental health guidance is an intriguing and valuable pursuit. That being said, the fusion engine must be shaped to produce effective and valid results. The blending has to be rigorously safe and sound.


