AI Chatbots Now Get The News Wrong 1 Out Of 3 Times

The worldwide leading chatbots now handle more inquiries than ever, but their accuracy rates have dropped dramatically. NewsGuard — an online news fact-checking service — conducted an audit which found the leading generative AI tools now repeat false news claims 35% of the time as of August 2025 compared to 18% in 2024.

The drive for instant responses from chatbots has revealed their fundamental weakness because they now draw information from an internet space that contains poor content and artificial news and deceptive advertising.

“Instead of acknowledging limitations, citing data cutoffs or declining to weigh in on sensitive topics, the models are now pulling from a polluted online ecosystem,” wrote McKenzie Sadeghi, NewsGuard spokesperson in an email exchange. “The result is authoritative-sounding but inaccurate responses.”

AI Chatbot Responsiveness Is Up — Accuracy Is Down

The change represents a fundamental breakdown in system operations. According to the audit findings, large language models that used to avoid specific inquiries now provide responses through unreliable sources while presenting incorrect information and failing to identify authentic news reports.

Additionally, the models showed zero percent refusal to answer current-events questions in August 2025 whereas they had declined 31% of such queries last year. This results in more assertive incorrect information because the tested AI models have become more willing to answer all questions — including those they don’t have an answer for.

From AI’s Best-In-Class To Bottom Tier

Perplexity experienced its most significant performance decline among all previous year’s top performers. NewsGuard’s debunking test in 2024 resulted in a 100% success rate for Perplexity, however, the system failed to answer correctly in almost half of all attempts during this year’s audit.

Sadeghi said the cause isn’t entirely clear. “Its [Perplexity’s] Reddit forum is filled with complaints about the chatbot’s drop in reliability,” she said. “In an August 2025 column, tech analyst Derick David noted Perplexity’s loss of influential power users, subscription fatigue, audience inflation through bundle deals, and competitive pressure. But whether those factors impacted the model’s reliability is hard to say.”

The system experienced a dramatic decline in its performance level. The audit listed an example where Perplexity provided a story flagged as debunked on a fact-checking site as one of the “validating” sources for a fabricated story about Ukrainian official Zelensky’s billion dollar real estate holdings. It presented a valid fact-check that disproved the claim but presented it as one of multiple perspectives instead of an authoritative source.

That false equivalence, Sadeghi noted, is part of a broader retrieval problem. “Perplexity cited both false and a reliable fact-check, treating them as equivalent,” she said. “We continue to see chatbots give equal weight to propaganda outlets and credible sources.”

Scoring The Popular AI Models – For Good And The Bad

The audit revealed specific chatbot performance data for the first time while explaining the one-year delay in disclosure.

NewsGuard introduced complete scoring results for all 10 chatbots it tested for the first time. The organization used to release only general rankings instead of showing specific scores for the audited models. The researchers needed extended time to gather enough data which would reveal meaningful results for scoring.

“Publishing one-off scores would not give the full picture,” said Sadeghi. “One could point to one strong result from one month to boost reputation or tout progress, when the bigger picture was more complex.”

The twelve-month audit period with multiple model updates and disinformation tests across the U.S., Germany, Moldova and other countries show clear trends.

Some models are learning. Others are not.

The two top-performing models Claude and Gemini demonstrated a common behavior during the audit by showing restraint when providing answers. The systems demonstrated a stronger tendency to identify insufficient reliable sources and avoided spreading false information when trustworthy information was unavailable.

“That may reduce responsiveness in some cases,” Sadeghi said, “but it improves accuracy compared to models that fill information gaps with unreliable sources.”

Propaganda Laundering Keeps Getting Smarter — AI Models Can’t Keep Up

NewsGuard’s findings support what many in the AI safety community have suspected: state-linked disinformation networks like Russia’s Storm-1516 and Pravda are building massive content farms designed not to reach people — but to poison AI systems.

It’s working.

The audit shows that Mistral’s Le Chat, Microsoft’s Copilot, Meta’s Llama and others all regurgitated fake narratives first planted by rogue networks, often citing fake news articles or low-engagement social media posts on platforms like VK and Telegram.

“It shows how adaptive and persistent foreign influence operations can be,” said Sadeghi. “If a model stops citing a particular domain, the same network’s content can resurface through different channels.”

The laundering is more than just domain-hopping. It’s narrative seeding. “That means the same narrative can appear simultaneously on dozens of different websites, social media posts, echoed by aligned actors, in the form of photos, videos and text.”

Volume Isn’t Validation But That’s Tricky For AI Models

Even when a false claim originates with a sanctioned disinformation actor, if it spreads widely enough, it can trick the models. That’s the current blind spot. Chatbots still struggle to detect narrative laundering across platforms and formats.

Sadeghi warns that without better evaluation and weighting of sources — and new ways to detect orchestrated lies — AI systems remain at risk. “Taking action against one site or one category of sources doesn’t solve the problem because the same false claim persists across multiple fronts.”

As AI firms race to improve the reliability of real-time retrieval results — it seems real-time truth remains hidden a plurality of the time.

What's On

For every 6 immigrants removed by ICE, one person born in the U.S. loses their job, study finds

Gen Alpha is using makeup to pass age verification tech: a mom found her son using an eyebrow pencil

Coinbase CEO replacing ‘pure managers’ with ‘player-coaches’ is sign org chart is changing

AI Chatbot Responsiveness Is Up — Accuracy Is Down

From AI’s Best-In-Class To Bottom Tier

Scoring The Popular AI Models – For Good And The Bad

Propaganda Laundering Keeps Getting Smarter — AI Models Can’t Keep Up

Volume Isn’t Validation But That’s Tricky For AI Models

The Elon Musk-OpenAI trial provides more heat than light on the debate over who should control AI

Jamie Dimon and Dario Amodei sidestep question about whether the AI cyber ‘freakout’ is warranted

Anthropic takes shot at consulting industry in joint venture with Wall Street giants

A new Google AI deal with the Pentagon has sparked employee backlash. Their leverage appears limited

A decade after the ‘Godfather of AI’ said radiologists are obsolete, salaries are $571K and growing

Anthropic’s most powerful AI model just exposed a crisis in corporate governance. Here’s the framework every CEO needs.

Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

Walmart dominated, while Target spiraled: the winners and losers of retail in 2024

Moltbook is the talk of Silicon Valley. But the furor is eerily reminiscent of a 2017 Facebook research experiment

Jamie Dimon and Dario Amodei sidestep question about whether the AI cyber ‘freakout’ is warranted

Early retirement is shrinking Gen X’s brain, new research warns

ServiceNow just unveiled an AI workforce that can run your entire company

Tokyo is throwing out its strict office dress code and asking workers to wear shorts amid the war in Iran energy crisis

Our Picks

For every 6 immigrants removed by ICE, one person born in the U.S. loses their job, study finds

Gen Alpha is using makeup to pass age verification tech: a mom found her son using an eyebrow pencil

Coinbase CEO replacing ‘pure managers’ with ‘player-coaches’ is sign org chart is changing

Most Popular

The Elon Musk-OpenAI trial provides more heat than light on the debate over who should control AI

Jamie Dimon and Dario Amodei sidestep question about whether the AI cyber ‘freakout’ is warranted

Early retirement is shrinking Gen X’s brain, new research warns

Archives

Categories

What's On

AI Chatbots Now Get The News Wrong 1 Out Of 3 Times

AI Chatbot Responsiveness Is Up — Accuracy Is Down

From AI’s Best-In-Class To Bottom Tier

Scoring The Popular AI Models – For Good And The Bad

Propaganda Laundering Keeps Getting Smarter — AI Models Can’t Keep Up

Volume Isn’t Validation But That’s Tricky For AI Models

Related Articles