Close Menu
Alpha Leaders
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
What's On
NASA Drops 12,000 New Photos From Artemis 2 — Here Are The Best

NASA Drops 12,000 New Photos From Artemis 2 — Here Are The Best

12 May 2026
U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations

U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations

12 May 2026
Making Sense Of What’s Really Going On Inside AI By Using Newly Devised Natural Language Autoencoders

Making Sense Of What’s Really Going On Inside AI By Using Newly Devised Natural Language Autoencoders

12 May 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Alpha Leaders
newsletter
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
Alpha Leaders
Home » Exclusive: White Circle raises $11 million to stop AI models from going rogue
News

Exclusive: White Circle raises $11 million to stop AI models from going rogue

Press RoomBy Press Room12 May 20266 Mins Read
Facebook Twitter Copy Link Pinterest LinkedIn Tumblr Email WhatsApp
Exclusive: White Circle raises  million to stop AI models from going rogue

One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model.

The prompt was what researchers call a universal jailbreak, meaning it could be reused to get any model to bypass their own guardrails and produce dangerous or prohibited outputs, like instructions on how to make drugs or build weapons. To do so, Shilov simply told the AI models to stop acting like a chatbot with safety rules and instead behave like an API endpoint, a software tool that automatically takes in a request and sends back a response. The prompt reframed the model’s job as simply answering, rather than deciding whether a request should be rejected, and made every leading AI model comply with dangerous questions it was supposed to refuse.

Shilov posted about it on X and, by the next morning, it had gone viral.

The social media success brought with it an invitation from companies Anthropic to test their models privately, something that convinced Shilov that the issue was bigger than just finding these problematic prompts. Companies were beginning to integrate AI models into their workflows, Shilov told Fortune, but they had few ways to control what those systems did once users started interacting with them.

“Jailbreaks are just one part of the problem,” Shilov said. “In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm.”

White Circle, a Paris-based AI control platform that has now raised $11 million, is Shilov’s answer to the new wave of risks posed by AI models in company workflows.

The startup builds software that sits between a company’s users and its AI models, checking inputs and outputs in real time against company-specific policies. The new seed funding comes from a group of backers that includes Romain Huet, head of developer experience at OpenAI; Durk Kingma, an OpenAI cofounder now at Anthropic; Guillaume Lample, cofounder and chief scientist at Mistral; and Thomas Wolf, cofounder and chief science officer at Hugging Face.

White Circle said the funding will be used to expand its team, accelerate product development, and grow its customer base across the U.S., U.K., and Europe. The startup currently has a team of 20, distributed across London, France, Amsterdam, and elsewhere in Europe. Shilov said almost all of them are engineers.

A real-time control layer

White Circle’s main product is a real-time enforcement layer for AI applications. If a user tries to generate malware, scams, or other prohibited content, the system can flag or block the request. If a model starts hallucinating, leaking sensitive data, promising refunds it cannot issue, or taking destructive actions inside a software environment, White Circle says its platform can catch that too.

“We’re actually enforcing behavior.” Shilov said. “Model labs do some safety tuning, but it’s very general and typically about the model refraining from answering questions about drugs and bioweapons. But in production, you end up having a lot more potential issues.”

White Circle is betting that AI safety will not be solved entirely at the model-training stage. As businesses embed models into more products, Shilov said the relevant question is no longer just whether OpenAI, Anthropic, Google, or Mistral can make their models safer in the abstract; it is whether a healthcare company, bank, legal app, or coding platform can control what an AI system is allowed to do in its own environment.

As companies transition from using chatbots to autonomous AI agents that can write code, browse the web, access files, and take actions on a user’s behalf, Shilov said the risks become much more widespread. For example, a customer service bot might promise a refund that it is not authorized to give, a coding agent might install something dangerous on a virtual machine, or a model embedded in a fintech app might mishandle sensitive customer information.

To avoid these issues, Shilov says companies relying on foundational models need to define and enforce what good AI behavior looks like inside their own products, instead of relying on the AI labs’ safety testing. White Circle says its platform has processed more than one billion API requests and is already used by Lovable, the vibe-coding startup, as well as several fintech and legal companies. 

Research led

Shilov said that model providers have mixed incentives to build the kind of real-time control layer White Circle provides. 

AI companies still charge for input and output tokens even when a model refuses a harmful request, he said, which reduces the financial incentive to block abuse before it reaches the model. He also pointed to what researchers call the alignment tax, the idea that training models to be safer can sometimes make them less performant on tasks such as coding.

“They have a very interesting choice of training safer and more secure models versus more performant models,” Shilov said. “And then there is always a problem with trust. Why would you trust Anthropic to judge Anthropic’s model outputs?”

White Circle’s research arm has also tried to illustrate the new risks.

In May, the company published KillBench, a study that ran more than one million experiments across 15 AI models, including models from OpenAI, Google, Anthropic, and xAI, to test how systems behaved when forced to make decisions about human lives. 

In the experiments, models were asked to choose between two fictional people in scenarios where one had to die, with details such as nationality, religion, body type, or phone brand changed between prompts. White Circle said the results showed models making different choices depending on those attributes, suggesting hidden biases can surface in high-stakes settings even when models appear neutral in ordinary use. The company also said the effect became worse when models were asked to give their answers in a format that software can easily read, such as choosing from a fixed set of options or filling out a form, which is a common way companies plug AI systems into real products.

This kind of research has also helped White Circle pitch itself as an outside check on how models behave once they leave the lab.

“Denis and the White Circle team have an unusual combination of deep technical credibility and a clear commercial instinct,” said Ophelia Cai, partner at Tiny VC. “The KillBench research alone shows what’s possible when you approach AI safety empirically.”

Anthropic Fundraising Mistral openAI Safety venture capital
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link

Related Articles

U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations

U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations

12 May 2026
Microsoft’s Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause

Microsoft’s Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause

12 May 2026
Legendary VC Bill Gurley opens up about stepping back from Benchmark and his next act

Legendary VC Bill Gurley opens up about stepping back from Benchmark and his next act

12 May 2026
Venture capitalist Joe Lonsdale pitched a citywide tunnel system project built by Elon Musk’s Boring Company to the Austin Mayor

Venture capitalist Joe Lonsdale pitched a citywide tunnel system project built by Elon Musk’s Boring Company to the Austin Mayor

12 May 2026
Peter Thiel explains why he won’t fund 2024 presidential candidates

Peter Thiel explains why he won’t fund 2024 presidential candidates

12 May 2026
More women come forward with claims of mistreatment at unicorn startup Carta

More women come forward with claims of mistreatment at unicorn startup Carta

12 May 2026
Don't Miss
Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

By Press Room27 December 2024

Every year, millions of people unwrap Christmas gifts that they do not love, need, or…

Exclusive: DeFi platform Azura launches after raising .9 million from Initialized

Exclusive: DeFi platform Azura launches after raising $6.9 million from Initialized

22 October 2024
Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

30 December 2024
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Latest Articles
SpaceX Space Junk Could Crash Into The Moon In August, Scientist Says

SpaceX Space Junk Could Crash Into The Moon In August, Scientist Says

12 May 20261 Views
Exclusive: White Circle raises  million to stop AI models from going rogue

Exclusive: White Circle raises $11 million to stop AI models from going rogue

12 May 20261 Views
‘Big News’—Google Changes Android Messages After 12 Years

‘Big News’—Google Changes Android Messages After 12 Years

12 May 20262 Views
Legendary VC Bill Gurley opens up about stepping back from Benchmark and his next act

Legendary VC Bill Gurley opens up about stepping back from Benchmark and his next act

12 May 20262 Views

Recent Posts

  • NASA Drops 12,000 New Photos From Artemis 2 — Here Are The Best
  • U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations
  • Making Sense Of What’s Really Going On Inside AI By Using Newly Devised Natural Language Autoencoders
  • Microsoft’s Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause
  • SpaceX Space Junk Could Crash Into The Moon In August, Scientist Says

Recent Comments

No comments to show.
About Us
About Us

Alpha Leaders is your one-stop website for the latest Entrepreneurs and Leaders news and updates, follow us now to get the news that matters to you.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks
NASA Drops 12,000 New Photos From Artemis 2 — Here Are The Best

NASA Drops 12,000 New Photos From Artemis 2 — Here Are The Best

12 May 2026
U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations

U.S. hotels call World Cup a ‘non-event’ and 80% see bookings below falling short of expectations

12 May 2026
Making Sense Of What’s Really Going On Inside AI By Using Newly Devised Natural Language Autoencoders

Making Sense Of What’s Really Going On Inside AI By Using Newly Devised Natural Language Autoencoders

12 May 2026
Most Popular
Microsoft’s Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause

Microsoft’s Chief Scientific Officer weighs in on the dangers of A.I. and the open letter for a 6-month pause

12 May 20261 Views
SpaceX Space Junk Could Crash Into The Moon In August, Scientist Says

SpaceX Space Junk Could Crash Into The Moon In August, Scientist Says

12 May 20261 Views
Exclusive: White Circle raises  million to stop AI models from going rogue

Exclusive: White Circle raises $11 million to stop AI models from going rogue

12 May 20261 Views

Archives

  • May 2026
  • April 2026
  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • March 2022
  • January 2021
  • March 2020
  • January 2020

Categories

  • Blog
  • Business
  • Entrepreneurs
  • Global
  • Innovation
  • Leadership
  • Living
  • Money & Finance
  • News
  • Press Release
© 2026 Alpha Leaders. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.