OpenAI publicly addressed a lawsuit from the New York Times in a strongly worded blog post on Monday, saying the newspaper’s complaint was “not telling the full story” about its use of Times data.
The suit, filed in December, claims OpenAI and its largest investor Microsoft Corp. relied on copyrighted articles to train the startup’s popular ChatGPT chatbot and other artificial intelligence features. The complaint pointed to examples of the chatbot reproducing chunks of text pulled almost verbatim from New York Times articles.
OpenAI said the sort of “regurgitation” the paper referred to in its recent lawsuit is a “rare bug” that the company is “working to drive to zero.” OpenAI also said the Times may have “intentionally manipulated prompts” and “cherry-picked their examples from many attempts.”
The generative AI technology behind products like OpenAI’s chatbot is powered by large language models, massive AI systems that suck up enormous volumes of digital text — from news articles, social media posts or other internet sources. The programs analyze that written material to become adept at generating new text, like summaries of current events, in response to a few words of prompting from a user.
Though the use of online data has long been a common practice by companies and academic researchers, during Silicon Valley’s AI boom such systems have recently come under fire from artists and other content creators about compensation for the use of their work to create the technology. The AI products have already spurred other numerous other lawsuits.
In its post, OpenAI said that sometimes the systems memorize chunks of text, an issue it called “a rare failure of the learning process that we are continually making progress on.”
The New York Times did not immediately respond to a request for comment.