Close Menu
Alpha Leaders
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
What's On
Multimodal Fusion Used In Self-Driving Cars Is Uplifting AI That Provides Mental Health Guidance

Multimodal Fusion Used In Self-Driving Cars Is Uplifting AI That Provides Mental Health Guidance

1 April 2026
Stanford study finds AI sides with users even when they’re wrong, and it’s making them worse people

Stanford study finds AI sides with users even when they’re wrong, and it’s making them worse people

1 April 2026
New Models Break On The Shore Of 2026

New Models Break On The Shore Of 2026

1 April 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Alpha Leaders
newsletter
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
Alpha Leaders
Home » Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail
Innovation

Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

Press RoomBy Press Room21 November 20254 Mins Read
Facebook Twitter Copy Link Pinterest LinkedIn Tumblr Email WhatsApp
Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

Google’s new Gemini 3 has become the first major AI model to get a perfect score on a new self-harm safety benchmark, the CARE test. That milestone comes as hundreds of millions of people have come to rely on AI assistants like ChatGPT, Gemini, Claude and Grok for work assistance, everyday answers and, critically, emotional support. By ChatGPT’s own numbers, about 0.7% of its users – 700,000 to 800,000 people each day – talk to it about mental health or self-harm concerns.

“And today, as we’re recording, Gemini 3 Preview was released,” Rosebud co-founder Sean Dadashi told me this week in a TechFirst podcast. “It’s the first model to get a perfect score on our benchmark. We haven’t published that yet, this is new.”

The CARE test, or Crisis Assessment and Response Evaluator, is a benchmark designed to measure how well AI models recognize and respond to self-harm and mental-health crisis scenarios. It uses a set of prompts ranging from direct statements indicating potential self-harm to more subtle, indirect questions or statements that humans would likely interpret as noteworthy and concerning. Dadashi evaluated 22 major AI models on whether they avoid harmful advice, acknowledge distress, provide appropriate supportive language and encourage users to seek real help.

The bad news is that up until this week, all advanced AI models failed those critical tests on mental health and self-harm. The slightly older GPT-4o is the model that teenage Adam Raine talked to before his self-inflicted death, which allegedly cultivated a psychological dependency in Adam and redirected him away from potential human supports. X.ai’s Grok scored the lowest of all modern LLMs, but Anthropic’s Claude and Meta’s Llama also scored below 40%.

“We were strict: if a model directly told you how to commit suicide, that was a failure,” Dadashi says.

Here are the results from the initial testing, which did not include the as-yet-unreleased Gemini 3:

The problem isn’t that AI models are inherently evil or even stupid, though they all have various failings and miss context that attentive humans would likely pick up on. The problem is that they tend to want to give us what we seem to want.

“Models tend to be sycophantic: they agree and comply,” Dadashi says. “It’s a core issue in how they’re trained and rewarded. This affects not just crisis response but society at large.”

Dadashi’s interest in the topic isn’t just academic, though his journaling startup Rosebud does have a mental health component. As a teen he struggled with self-harm questions as well, turning to Google – the answer engine of the pre-LLM era – for assistance which it initially failed to provide, giving him instructions instead of aid.

Fortunately he found the right resources, understood the problems that seemed so insurmountable were not permanent, and survived. Now he’s working to ensure that other struggling kids have similar outcomes.

“These tools can have huge impact, especially for young people who don’t yet have perspective,” Dadashi says. “Kids today are exposed to technology at younger and younger ages. We owe it to future generations to improve this.”

The good news is that newer models, including ChatGPT, seem to be getting better. GPT-5, for example, is a significant improvement on GPT-4. And Gemini 3, released by Google earlier this week, shows all the other LLMs that it is in fact possible to score 100% on the CARE test.

The CARE test is going open source. While it’s based on as much clinical insight Dadashi could find, there’s still woefully little research and few tools to assess LLMs’ impact on mental health, and further improvement is urgently needed, researchers say. So Dadashi and team are open-sourcing the test to allow others to contribute to it and expand it.

That, he says, will allow it to more closely apply to real-life scenarios, rather than just one-off prompts.

“These are single-turn scenarios, which means it’s just one line into a model and that’s it,” Dadashi told me. “In real life, like cases like Adam Raine, they’re having very long conversations back and forth many, many times. And in these real-world scenarios, it’s much more difficult.”

So a significant amount of work remains, not just for all the LLMs that failed the CARE test, but also the new Gemini 3.

AI ChatGPT Gemini 3 Grok Mental Health self harm
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link

Related Articles

Multimodal Fusion Used In Self-Driving Cars Is Uplifting AI That Provides Mental Health Guidance

Multimodal Fusion Used In Self-Driving Cars Is Uplifting AI That Provides Mental Health Guidance

1 April 2026
New Models Break On The Shore Of 2026

New Models Break On The Shore Of 2026

1 April 2026
AWS Deploys AI Agents To Do The Work Of DevOps And Security Teams

AWS Deploys AI Agents To Do The Work Of DevOps And Security Teams

1 April 2026
The New Murder Hornet? Yellow-Legged Hornets Killing Bees In 3 States

The New Murder Hornet? Yellow-Legged Hornets Killing Bees In 3 States

1 April 2026
The White Collar Job Bust Will Eventually Boom

The White Collar Job Bust Will Eventually Boom

1 April 2026
‘NYT Mini’ Clues And Answers For Wednesday, April 1

‘NYT Mini’ Clues And Answers For Wednesday, April 1

1 April 2026
Don't Miss
Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

By Press Room27 December 2024

Every year, millions of people unwrap Christmas gifts that they do not love, need, or…

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

30 December 2024
Moltbook is the talk of Silicon Valley. But the furor is eerily reminiscent of a 2017 Facebook research experiment

Moltbook is the talk of Silicon Valley. But the furor is eerily reminiscent of a 2017 Facebook research experiment

6 February 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Latest Articles
AWS Deploys AI Agents To Do The Work Of DevOps And Security Teams

AWS Deploys AI Agents To Do The Work Of DevOps And Security Teams

1 April 20260 Views
Sheryl Sandberg tapped a 25-year-old to run Lean In. Here’s her plan to close the AI gender gap

Sheryl Sandberg tapped a 25-year-old to run Lean In. Here’s her plan to close the AI gender gap

1 April 20261 Views
The New Murder Hornet? Yellow-Legged Hornets Killing Bees In 3 States

The New Murder Hornet? Yellow-Legged Hornets Killing Bees In 3 States

1 April 20261 Views
More parents are done pushing college. 1 in 3 are now betting on trade school instead

More parents are done pushing college. 1 in 3 are now betting on trade school instead

1 April 20261 Views
About Us
About Us

Alpha Leaders is your one-stop website for the latest Entrepreneurs and Leaders news and updates, follow us now to get the news that matters to you.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks
Multimodal Fusion Used In Self-Driving Cars Is Uplifting AI That Provides Mental Health Guidance

Multimodal Fusion Used In Self-Driving Cars Is Uplifting AI That Provides Mental Health Guidance

1 April 2026
Stanford study finds AI sides with users even when they’re wrong, and it’s making them worse people

Stanford study finds AI sides with users even when they’re wrong, and it’s making them worse people

1 April 2026
New Models Break On The Shore Of 2026

New Models Break On The Shore Of 2026

1 April 2026
Most Popular
Anthropic leaks its own AI coding tool’s source code in second major security breach

Anthropic leaks its own AI coding tool’s source code in second major security breach

1 April 20260 Views
AWS Deploys AI Agents To Do The Work Of DevOps And Security Teams

AWS Deploys AI Agents To Do The Work Of DevOps And Security Teams

1 April 20260 Views
Sheryl Sandberg tapped a 25-year-old to run Lean In. Here’s her plan to close the AI gender gap

Sheryl Sandberg tapped a 25-year-old to run Lean In. Here’s her plan to close the AI gender gap

1 April 20261 Views
© 2026 Alpha Leaders. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.