We all understand the concept of inbreeding in human terms – where people who are too genetically similar reproduce, resulting in offspring with deformities. With each inbred generation, the gene pool becomes less and less diverse.
But how could this possibly translate to generative AI? And why should we be concerned about generative AI inbreeding? Read on to find out.
What Is Inbreeding In Relation To Generative AI?
The term refers to the way in which generative AI systems are trained. The earliest large language models (LLMs), were trained on massive quantities of text, visual and audio content, typically scraped from the internet. We’re talking about books, articles, artworks, and other content available online – content that was, by and large, created by humans.
Now, however, we have a plethora of generative AI tools flooding the internet with AI-generated content – from blog posts and news articles, to AI artwork. This means that future AI tools will be trained on datasets that contain more and more AI-generated content. Content that isn’t created by humans, but simulates human output. And as new systems learn from this simulated content, and create their own content based on it, the risk is that content will become progressively worse. Like taking a photocopy of a photocopy of a photocopy.
It’s not dissimilar to human or livestock inbreeding, then. The “gene pool” – in this case, the content used to train generative AI systems – becomes less diverse. Less interesting. More distorted. Less representative of actual human content.
What Would This Mean For Generative AI Systems?
Inbreeding could pose a significant problem for future generative AI systems, rendering them less and less able to accurately simulate human language and creativity. One study has confirmed how inbreeding leads to generative AIs becoming less effective, finding that “without enough fresh real data in each generation … future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease.”
In other words, AIs need fresh (human-generated) data to get better and better over time. If the data they’re trained on is increasingly generated by other AIs, you end up with what’s called “model collapse.” Which is a fancy way of saying the AIs get dumber. This can happen with any sort of generative AI output – not just text but also images. This video shows what happens when two generative AI models bounce back and forth between each other, with one AI describing an image and then the other creating an image based on the description, and so on and so on in a loop. The starting point was the Mona Lisa, one of the world’s great masterpieces. The end result is just a freaky picture of squiggly lines.
Imagine this in terms of a customer service chatbot that gets progressively worse, producing increasingly clunky, robotic or even nonsensical responses. That’s the danger for generative AI systems – inbreeding could, in theory, render them pointless. It defeats the purpose of using generative AI in the first place. We want these systems to do a good job of representing human language and creativity, not get progressively worse. We want generative AI systems to get smarter and better at responding to our requests over time. If they can’t do that, what’s the point of them?
Perhaps The Bigger Question Is, What Does All This Mean For Humans?
We’ve all seen the hilariously weird images created by generative AI. You know what I mean – hands popping out of places they shouldn’t, nightmarish faces, and the like. We can laugh at these distortions because they’re so obviously not created by human artists.
But consider a future in which more and more of the content we consume is created by generative AI systems. More and more content that is distorted – or at the very least, utterly bland. Content that is not very representative of real human creativity. Our collective culture becomes increasingly informed by AI-generated content instead of human-generated content. We end up stuck in a “bland AI echo chamber.” What would this mean for human culture? Is this a road we want to go down?
Are There Any Solutions?
One way forward is to design future AI systems so they can distinguish AI-generated content from human-generated content – and, therefore, prioritize the latter for training purposes. But that’s easier said than done because AIs find it surprisingly difficult to tell the difference between the two! Case in point: OpenAI’s “AI classifier” tool, which was introduced to distinguish AI text from text written by humans, was discontinued in 2023 because of its “low rate of accuracy.” And if OpenAI – arguably the leaders in generative AI – are struggling, you know the problem must be pretty thorny. However, providing the problem can be cracked; this remains probably the most effective way to avoid inbreeding in the future.
We also need to avoid over-reliance on generative AI systems, and continue to prioritize very human attributes like critical thinking and creative thinking. We need to remember that generative AI is a tool – a hugely valuable tool – but it’s no substitute for human creativity and culture.