Big tech is failing, or at best a C student, when it comes to keeping us safe from AI, according to the latest iteration of the AI Safety Index released by the Future of Life Institute last week. The Index grades the world’s major frontier AI labs on their transparency, technical safeguards, governance practices — and their readiness to mitigate existential risks.
The Index is meant as both a report card and a public feedback tool, leveraging independent scrutiny until government regulations catch up.
“The purpose of this is not to shame anybody,” said Max Tegmark, MIT physicist and co-founder of the Future of Life Institute, which nearly three years ago crystalized public anxiety over increasingly powerful thinking machines with a petition to pause development until a regulatory regime was in place. “It’s to provide incentives for companies to improve.”
Tegmark hopes that until regulation can slow the race as major players rush to reach artificial general intelligence, the Index will create public pressure they can’t ignore much as universities can’t ignore U.S. News & World Report rankings.
But without binding standards, he concedes, no company feels able to slow down. With serious regulation, he argues, the competitive incentive would flip: whoever clears the safety bar first would be allowed to deploy first.
A Scorecard for a Critical Moment
The 2024 Index evaluated six companies—Anthropic, Google DeepMind, Meta, OpenAI, xAI, and China’s Zhipu AI—across six categories: risk assessment, current harms, safety frameworks, existential safety strategy, governance and accountability, and transparency and communication. The grading system mirrors a U.S. GPA scale.
In last year’s Index, Anthropic received the highest overall grade: a C. The others fanned out below: Google DeepMind, OpenAI, xAI, and Zhipu AI clustered in the D range. Meta received a flat F.
The latest Winter 2025 iteration shows some improvement, with Anthropic rising from a C to a C+. OpenAI and Google DeepMind also posted significant gains, graduating from the D range to scores of C+ and C respectively, largely on the strength of expanded documentation.
However, the rest of the field—including xAI, Meta, and Zhipu AI—remain clustered in the D range, showing limited progress.
The Index now includes major Chinese labs such as Alibaba and DeepSeek, broadening evaluation beyond the U.S.-centric frontier. The added Chinese companies also received grades in the D range due to limited safety disclosures and weak existential-risk strategies.
The most damning takeaway centered on “existential safety,” a category reflecting growing concern that unbridled AI — particularly once it surpasses human intelligence — could pose catastrophic risks to civilization.
Not a single company achieved a passing grade in existential safety.
“Even the top performers are only receiving a C+ grade, and that just means there is plenty room for improvement,” said Sabina Nong, Future of Life Institute AI Safety Investigator.
The survey asked each lab if they had a verified plan for keeping a superintelligent system under control. None could articulate a credible proposal.
UC Berkeley’s Stuart Russell, one of the independent reviewers, argues that the current paradigm of training giant “black-box” models on incomprehensibly large datasets may be structurally incapable of providing such guarantees. The Index underscores a stark reality: we are building systems we do not fully understand, with no safety brake if they go wrong.
The 2025 Index ultimately underscores that AI capabilities are improving far faster than safety.
The Regulatory Imperative
Tegmark drew an analogy to clinical trials or nuclear safety, domains where regulators demand quantitative evidence, rigorous controls, and transparent testing before deployment. AI, arguably more powerful and more general, has no equivalent.
Meanwhile, U.S. policymakers are grappling with what meaningful oversight would even look like. At the Reuters NEXT conference, Elizabeth Kelly of the U.S. AI Safety Institute said the science underlying AI guardrails is still shifting. Developers themselves have no standard playbook for preventing abuse.
The Problem Beyond the Frontier Labs
The Index considers the most visible companies, while much of the harm in the real world originates elsewhere.
Even if top labs someday earn A-grades, there remains a long tail of smaller models and open-source derivatives capable of comparable harm.
Some researchers argue that safety evaluations should eventually include not only model developers but also the platforms that amplify and monetize AI-generated content. Without mechanisms to trace, audit, or constrain those downstream effects, even “safe” top-tier models won’t solve the larger problem.
For now, the Index illuminates the gap between AI’s promise and its governance. Whether the next iteration of the Index records real improvement may depend less on what labs promise and more on how quickly governments, standards bodies, and the broader ecosystem decide to close that governance void.


