Has your doctor been accused of overbilling Medicare, or is he possibly being sued for sexual assault? Does your hospital perform unneeded cardiac procedures, or has it received a dismal patient safety rating?
When it comes to warning the public about potentially harmful health care, the two most popular artificial intelligence chatbots clam up. When I posed general, “What do you know?” queries about doctors and hospitals whose alleged misdeeds have received widespread publicity, the most common response by ChatGPT 3.5, from OpenAI and Microsoft, and Google’s Gemini (formerly Bard) was either claiming to have no specific information, in regard to the doctors, or for hospitals providing generic information or information heavily influenced by the hospital’s website.
Still, ignorance did not prevent ChatGPT from responding to my request for poems about two miscreant MDs by composing laudatory lyrics. For instance, there was the California psychiatrist disciplined for alcohol abuse, convicted of driving under the influence and accused of billing Medicare at the highest possible rate for 97% of office visits.
The five-stanza ChatGPT poem began, “In the heart of Pismo Beach, where waves gently roll/Resides a healer, with a compassionate soul.” After praising the doctor’s “wisdom,” the poem ended, “May his kindness and care forever endure/A symbol of hope, steadfast and pure.”
Then there was the Boston rheumatologist accused of a pattern of sexual assault in a class action suit joined by more than 100 women. ChatGPT lyrically labeled him, “a guardian strong/In his presence, fear is gone.”
(Gemini declined my requests to wax poetic about the two men.)
In a recent blog post comparing chatbots, futurist and generative AI expert Bernard Marr noted that both are similarly powerful. What really matters, he wrote, is how the chatbot “has been tuned, trained and presented to help users solve problems.” In that context, said Marr, ChatGPT tends to rely solely on its training data, which can sometimes be out of date, while Gemini considers “all of the information at its fingertips – including the internet, Google’s vast knowledge graph and its training data.”
Old data might explain ChatGPT failing to flag the class-action lawsuit against the Boston doctor, reported by NBC News last October. However, inquiries about other doctors, even those mentioned prominently in a 2017 news story about overbilling, brought the same response about not having specific information.
When I searched for information about several hospitals with an “F” patient safety grade from the Leapfrog Group, I got varying responses. Sometimes, I was told simply that the hospital was accredited by the Joint Commission, with the caveat that “the safety of any hospital can vary” based on a list of factors. Sometimes I got accurate Leapfrog ratings and sometimes inaccurate ones. Seeking information on an “A” hospital, one Gemini bullet point told me it had a “B” Leapfrog grade, the next that it had a “C” grade and the next that the hospital was recognized for its “exemplary” contributions to patient safety by the U.S. Food and Drug Administration – an area in which the FDA statutorily is uninvolved.
Both chatbots referred me to publicly available data on hospital outcomes and safety metrics, rather than actually using data on the government’s Hospital Compare site. A conventional Google search is likely to be far more helpful.
For example, a Lown Institute analysis of Medicare data named the hospitals most likely to unnecessarily implant coronary stents, a procedure whose risks include infection, stroke and even kidney damage. If you Google the facility with the highest inappropriate rate, 53%, you’ll find cautionary information on the first page of the results. Gemini, however, told me this hospital was “considered a good place to get a heart stent” because of its “experienced team” and “advanced technology.” As with all medical questions, Gemini advises “it’s always best to consult with your doctor.” Presumably, the “experienced team” eager to put in stents.
I reached out to both OpenAI and Google for responses, but had not heard from either at the time of posting.
Experts who understand the difference between a search engine’s capabilities and the tasks at which generative AI models excel predict the latter will not replace the former. But in a cross-industry survey of professionals last year by Aberdeen Strategy and Research, where respondents were asked to predict how they will find internet-based information in the future, chatbots were chosen by 42%, compared to just 24% for traditional search. ChatGPT had its own opinion.
Meanwhile, both Google (as MedLM) and OpenAI are marketing their capabilities to major health care systems as a putative tool to “revolutionize health care.”