๐Ÿ“š When AI Eats It's Own Slop; It's Called Model Collapse

Rooted Communications | What Happens When AI Eats Its Slop?

Sam Chavez (she/they/he)

Welcome Back! ๐Ÿ‘‹๐Ÿผ

Our collaborator, Taryn Talley, is out with an interesting piece about the repercussions of AI slop on the AI models themselves. We've covered the harms of AI and alternatives to using AI in your work. Today, we ask, what happens when AI models eat their own slop? And how does that impact us? Read on for more...


Getting to the roots - navigating tech, media, & communications
The Tech News not fit to print in mainstream media!
  1. ๐Ÿ—ž๏ธ Amazon Ring Partners with FLOC (TechCrunch)
  2. ๐Ÿ—ž๏ธ Did Big Tech Enshitify The Entire Economy? (Cory Doctorow)
  3. ๐ŸŽ™๏ธ From TikTok Ban to MAGA Ownership (the roots of change podcast)
  4. ๐Ÿ—ž๏ธ Writing vs AI; What's the Difference? (Pluralistic)

Connect On & Off Big Tech

Mastodon โ€ข BlueSky โ€ข PeerTube โ€ข YouTube โ€ข Instagram โ€ข TikTok


Resources for the People. Resources and events to grow and root into your activism

More Rooted Resources

Resource Hub โ€ข Signal Channel โ€ข Podcast โ€ข Free Templates


Storytelling for Change - Actionable communications advice for advocacy, activist, and movements

What Happens When AI Eats its Own Slop? Itโ€™s Called Model Collapse.

from collaborator, Taryn Talley (she/her)

Like people living solely on highly processed foods risk poorer health outcomes, if Large Language Models ingest a diet of non-stop AI-generated content, the health of their training data is at risk.

Aspect

Ultra-Processed Food Risk

AI-Generated Data Risk

Source

Producing over-processed โ€˜foodโ€™ designed for efficiency and cost often results in a loss of nutritional value.

Content produced by algorithms that favor high-probability patterns loses the "long-tail" of human nuance.

Result

Short-term energy but long-term health decline (e.g., metabolic issues).

Models appear fluent at first, but over time, reasoning, diversity, and accuracy "collapse" over time.

Mechanism

The body lacks the complex micronutrients found in whole foods.

LLMs lack the "unlikely" but true edge cases that only human creativity and an error-prone life provide.

The Loop

A diet of ultra-processed food can lead to cravings for more of the same, reinforcing bad habits.

Models trained on AI-generated data start "hallucinating" on their own errors, which amplifies them.

The Science Behind "Model Collapse"

In a research paper published by Nature in 2024, (authored by Shumailov et al.), it was confirmed that when AI models are trained exclusively on data generated by previous AI models, they go through two specific stages:

  • Early Model Collapse: Models begin losing "minority" dataโ€”the rare, unique, and creative parts of human language. The model output will start to sound  "average" at best.
  • Late Model Collapse: The model starts confusing different concepts (ex, answering a question about architecture with facts about biology) until every output is absolutely useless.

As that aforementioned research paper circulated, the term โ€˜model collapseโ€™ began to gain traction, prompting top-level LLMs to shift their stances. OpenAI and Google began prioritizing content licensing to ensure access to "clean" human-generated data - and no doubt to limit their future liability, learning from the initial capture of copyrighted material (without citation or compensation). These same LLMs also sought to preserve "pre-AI internet" data (created before late 2022) for future training.

According to a Gemini (1) prompt response:

As of 2026, the industry is seeing three specific areas where collapse is manifesting:

Sign of Collapse

Real-World Observation

The "Tail" Vanishing

Models are becoming less capable of discussing rare languages, niche scientific theories, or ultra-specific coding edge cases. They default to the "average" answer more often than they did in 2023.

Bias Amplification

Since AI data reinforces majority patterns, models are showing increased "homogenization." They sound more like "the average of the internet," losing the unique voices and cultural nuances found in the original human-only datasets.

"Digital Dementia"

In recursive testing (feeding a model its own output repeatedly), models like Metaโ€™s OPT-125M eventually began babbling about "jack rabbits" after starting with a prompt about architecture. While flagship models are more stable, they still show slight degradation when exposed to "AI slop" on the web.

What do the big three say about their teamsโ€™ efforts to prevent model collapse?

I wanted to share the perspective of the top LLMs. So I asked Gemini, Claude, and ChatGPT the following question: โ€œHi (LLM), what steps have your engineers taken to prevent the degradation that leads to late-stage model collapse?โ€

Not surprisingly, Geminiโ€™s response was much more robust than the other LLMs. ChatGPT came in second with a decent but high-level response. Claude was by far the underwhelming response. So, letโ€™s look at the techniques that the top three are currently employing to prevent model collapse. Ive also added which LLM mentioned which technique in their initial return.

Data Provenance and "The Vault" Strategy

In my research for this article, Iโ€™ve encountered the terms โ€œgold standard dataโ€ and โ€œpristine gold datasetโ€ multiple times. It makes sense that, to prevent ingestion of AI-slop, they need to maintain pure human-generated content (as protected source data), reducing the risk of AI-polluted web scrapes.

Continue to full article...

๐ŸŒบ for Flowering Members

This post is for subscribers on the ๐ŸŒบ Flowering (Tech Geeks & Communicators+) and ๐Ÿ“ Fruitful (Tech Geeks & Communicators+) tiers

Subscribe

Already have an account? Log in