'Model Collapse': Future AI generations may start speaking gibberish you don't understand

As AI-generated content becomes more prevalent and is added to the vast pool of training data, errors and nonsensical instances accumulate

New Delhi,
Updated Jun 20, 2023 5:53 PM IST

'Model Collapse': Future AI generations may start speaking gibberish you don't understand

AI might soon start talking gibberish, here's why

Artificial intelligence (AI) trained on other AI-generated content may start producing nonsensical and meaningless output after a few generations, warn scientists. Researchers from the University of Cambridge and the University of Oxford conducted a study to explore the consequences of training AIs on AI-generated material. They found that text and images generated by AIs become progressively degraded and lose intelligibility over generations of training, a phenomenon they called "model collapse."

As AI-generated content becomes more prevalent and is added to the vast pool of training data, errors and nonsensical instances accumulate. Later AIs struggle to distinguish between fact and fiction, leading to misinterpretation and reinforcement of their own beliefs. The scientists compare this process to the works of Mozart and Salieri, where each subsequent generation loses the original brilliance and devolves into a less impressive output.

Dr. Ilia Shumailov, the lead author of the study, explains that the problem lies in an AI's perception of probability. With each training iteration, improbable events become less likely in the AI's output, limiting the range of possibilities understood by subsequent AIs trained on that output.

Ross Anderson, one of the scientists working on this paper explained the issue using the example of pollution. Anderson said, "Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms that already did that, or that control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data."

The researchers also demonstrated this phenomenon using an example of training an AI language model on a text about medieval architecture. After multiple generations of training, the output degraded to meaningless text about jackrabbits instead of architectural theories.

Anderson, in his blog, concluded by comparing LLMs to fire. He said, "So there we have it. LLMs are like fire – a useful tool, but one that pollutes the environment. How will we cope with it?"

Also Watch: Adipurush, starring Prabhas, sees massive fall after great weekend business at box office; inappropriate depictions, Manoj Muntashir's controversial comments, here's what caused it

For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine

Published on: Jun 20, 2023 4:45 PM IST

COMPANIES

NEWS

'Model Collapse': Future AI generations may start speaking gibberish you don't understand

As AI-generated content becomes more prevalent and is added to the vast pool of training data, errors and nonsensical instances accumulate

TOP STORIES

TOP VIDEOS