Close Menu
Alpha Leaders
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
What's On
Blackstone CEO took home .2 billion last year after going ‘max everything’ with work—but he wouldn’t advise his children to put themselves under so much pressure

Blackstone CEO took home $1.2 billion last year after going ‘max everything’ with work—but he wouldn’t advise his children to put themselves under so much pressure

3 March 2026
OpenAI’s Pentagon deal raises new questions about AI and surveillance

OpenAI’s Pentagon deal raises new questions about AI and surveillance

3 March 2026
This boomer CEO became a Social Security advocate 15 years ago. Trump’s big tax cut ‘did not help’

This boomer CEO became a Social Security advocate 15 years ago. Trump’s big tax cut ‘did not help’

3 March 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Alpha Leaders
newsletter
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
Alpha Leaders
Home » Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail
Innovation

Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

Press RoomBy Press Room21 November 20254 Mins Read
Facebook Twitter Copy Link Pinterest LinkedIn Tumblr Email WhatsApp
Gemini 3 Just Scored 100% On A Critical Test All Other AI Models Fail

Google’s new Gemini 3 has become the first major AI model to get a perfect score on a new self-harm safety benchmark, the CARE test. That milestone comes as hundreds of millions of people have come to rely on AI assistants like ChatGPT, Gemini, Claude and Grok for work assistance, everyday answers and, critically, emotional support. By ChatGPT’s own numbers, about 0.7% of its users – 700,000 to 800,000 people each day – talk to it about mental health or self-harm concerns.

“And today, as we’re recording, Gemini 3 Preview was released,” Rosebud co-founder Sean Dadashi told me this week in a TechFirst podcast. “It’s the first model to get a perfect score on our benchmark. We haven’t published that yet, this is new.”

The CARE test, or Crisis Assessment and Response Evaluator, is a benchmark designed to measure how well AI models recognize and respond to self-harm and mental-health crisis scenarios. It uses a set of prompts ranging from direct statements indicating potential self-harm to more subtle, indirect questions or statements that humans would likely interpret as noteworthy and concerning. Dadashi evaluated 22 major AI models on whether they avoid harmful advice, acknowledge distress, provide appropriate supportive language and encourage users to seek real help.

The bad news is that up until this week, all advanced AI models failed those critical tests on mental health and self-harm. The slightly older GPT-4o is the model that teenage Adam Raine talked to before his self-inflicted death, which allegedly cultivated a psychological dependency in Adam and redirected him away from potential human supports. X.ai’s Grok scored the lowest of all modern LLMs, but Anthropic’s Claude and Meta’s Llama also scored below 40%.

“We were strict: if a model directly told you how to commit suicide, that was a failure,” Dadashi says.

Here are the results from the initial testing, which did not include the as-yet-unreleased Gemini 3:

The problem isn’t that AI models are inherently evil or even stupid, though they all have various failings and miss context that attentive humans would likely pick up on. The problem is that they tend to want to give us what we seem to want.

“Models tend to be sycophantic: they agree and comply,” Dadashi says. “It’s a core issue in how they’re trained and rewarded. This affects not just crisis response but society at large.”

Dadashi’s interest in the topic isn’t just academic, though his journaling startup Rosebud does have a mental health component. As a teen he struggled with self-harm questions as well, turning to Google – the answer engine of the pre-LLM era – for assistance which it initially failed to provide, giving him instructions instead of aid.

Fortunately he found the right resources, understood the problems that seemed so insurmountable were not permanent, and survived. Now he’s working to ensure that other struggling kids have similar outcomes.

“These tools can have huge impact, especially for young people who don’t yet have perspective,” Dadashi says. “Kids today are exposed to technology at younger and younger ages. We owe it to future generations to improve this.”

The good news is that newer models, including ChatGPT, seem to be getting better. GPT-5, for example, is a significant improvement on GPT-4. And Gemini 3, released by Google earlier this week, shows all the other LLMs that it is in fact possible to score 100% on the CARE test.

The CARE test is going open source. While it’s based on as much clinical insight Dadashi could find, there’s still woefully little research and few tools to assess LLMs’ impact on mental health, and further improvement is urgently needed, researchers say. So Dadashi and team are open-sourcing the test to allow others to contribute to it and expand it.

That, he says, will allow it to more closely apply to real-life scenarios, rather than just one-off prompts.

“These are single-turn scenarios, which means it’s just one line into a model and that’s it,” Dadashi told me. “In real life, like cases like Adam Raine, they’re having very long conversations back and forth many, many times. And in these real-world scenarios, it’s much more difficult.”

So a significant amount of work remains, not just for all the LLMs that failed the CARE test, but also the new Gemini 3.

AI ChatGPT Gemini 3 Grok Mental Health self harm
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link

Related Articles

‘Could it kill someone?’ A Seoul woman allegedly used ChatGPT to carry out two murders

‘Could it kill someone?’ A Seoul woman allegedly used ChatGPT to carry out two murders

3 March 2026
New Leak Signals Unprecedented Design Change

New Leak Signals Unprecedented Design Change

1 March 2026
Is Tourism A Tool Or A Threat?

Is Tourism A Tool Or A Threat?

1 March 2026
Trust In The AI Age

Trust In The AI Age

1 March 2026
LEGO Pikachu And Poke Ball (72152) Review: Lacking A Spark

LEGO Pikachu And Poke Ball (72152) Review: Lacking A Spark

1 March 2026
How The AI Boom Is Forcing A Clean Energy Reckoning

How The AI Boom Is Forcing A Clean Energy Reckoning

1 March 2026
Don't Miss
Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

By Press Room27 December 2024

Every year, millions of people unwrap Christmas gifts that they do not love, need, or…

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

30 December 2024
Moltbook is the talk of Silicon Valley. But the furor is eerily reminiscent of a 2017 Facebook research experiment

Moltbook is the talk of Silicon Valley. But the furor is eerily reminiscent of a 2017 Facebook research experiment

6 February 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Latest Articles
‘Could it kill someone?’ A Seoul woman allegedly used ChatGPT to carry out two murders

‘Could it kill someone?’ A Seoul woman allegedly used ChatGPT to carry out two murders

3 March 20261 Views
Trump’s strikes on Iran could cost American economy as much as 0 billion, top budget expert says

Trump’s strikes on Iran could cost American economy as much as $210 billion, top budget expert says

2 March 20260 Views
Interest on the .8 trillion national debt has tripled since 2020, topping defense and Medicaid

Interest on the $38.8 trillion national debt has tripled since 2020, topping defense and Medicaid

2 March 20261 Views
U.S.-Israeli attack on Iran could drive up crude costs to 0 and rival 1973 oil shock

U.S.-Israeli attack on Iran could drive up crude costs to $100 and rival 1973 oil shock

2 March 20260 Views
About Us
About Us

Alpha Leaders is your one-stop website for the latest Entrepreneurs and Leaders news and updates, follow us now to get the news that matters to you.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks
Blackstone CEO took home .2 billion last year after going ‘max everything’ with work—but he wouldn’t advise his children to put themselves under so much pressure

Blackstone CEO took home $1.2 billion last year after going ‘max everything’ with work—but he wouldn’t advise his children to put themselves under so much pressure

3 March 2026
OpenAI’s Pentagon deal raises new questions about AI and surveillance

OpenAI’s Pentagon deal raises new questions about AI and surveillance

3 March 2026
This boomer CEO became a Social Security advocate 15 years ago. Trump’s big tax cut ‘did not help’

This boomer CEO became a Social Security advocate 15 years ago. Trump’s big tax cut ‘did not help’

3 March 2026
Most Popular
Energy markets offer ‘relatively small reaction’ to Iran; prices may spike if oil isn’t flowing soon

Energy markets offer ‘relatively small reaction’ to Iran; prices may spike if oil isn’t flowing soon

3 March 20260 Views
‘Could it kill someone?’ A Seoul woman allegedly used ChatGPT to carry out two murders

‘Could it kill someone?’ A Seoul woman allegedly used ChatGPT to carry out two murders

3 March 20261 Views
Trump’s strikes on Iran could cost American economy as much as 0 billion, top budget expert says

Trump’s strikes on Iran could cost American economy as much as $210 billion, top budget expert says

2 March 20260 Views
© 2026 Alpha Leaders. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.