Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse

Incident 624: Child Sexual Abuse Material Taints Image Generators

Responded
Description: The LAION-5B dataset (a commonly used dataset with more than 5 billion image-description pairs) was found by researchers to contain child sexual abuse material (CSAM), which increases the likelihood that downstream models will produce CSAM imagery. The discovery taints models built with the LAION dataset requiring many organizations to retrain those models. Additionally, LAION must now scrub the dataset of the imagery.

Tools

New ReportNew ReportNew ResponseNew ResponseDiscoverDiscoverView HistoryView History

Entities

View all entities
Alleged: LAION developed an AI system deployed by Various people and Various organizations, which harmed Various people , Various organizations , LAION , General public and Children.

Incident Stats

Incident ID
624
Report Count
18
Incident Date
2023-12-20
Editors
Applied Taxonomies
MIT

MIT Taxonomy Classifications

Machine-Classified
Taxonomy Details

Risk Subdomain

A further 23 subdomains create an accessible and understandable classification of hazards and harms associated with AI
 

2.1. Compromise of privacy by obtaining, leaking or correctly inferring sensitive information

Risk Domain

The Domain Taxonomy of AI Risks classifies risks into seven AI risk domains: (1) Discrimination & toxicity, (2) Privacy & security, (3) Misinformation, (4) Malicious actors & misuse, (5) Human-computer interaction, (6) Socioeconomic & environmental harms, and (7) AI system safety, failures & limitations.
 
  1. Privacy & Security

Entity

Which, if any, entity is presented as the main cause of the risk
 

Human

Timing

The stage in the AI lifecycle at which the risk is presented as occurring
 

Pre-deployment

Intent

Whether the risk is presented as occurring as an expected or unexpected outcome from pursuing a goal
 

Unintentional

Incident Reports

Reports Timeline

Safety Review for LAION 5B - Response+13
Investigation Finds AI Image Generation Models Trained on Child Abuse
+1
LAION and the Challenges of Preventing AI-Generated CSAM
+1
Was an AI Image Generator Taken Down for Making Child Porn? - Response
Safety Review for LAION 5B

Safety Review for LAION 5B

laion.ai

Investigation Finds AI Image Generation Models Trained on Child Abuse

Investigation Finds AI Image Generation Models Trained on Child Abuse

cyber.fsi.stanford.edu

AI image training dataset found to include child sexual abuse imagery

AI image training dataset found to include child sexual abuse imagery

theverge.com

Study uncovers presence of CSAM in popular AI training dataset

Study uncovers presence of CSAM in popular AI training dataset

theregister.com

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material

404media.co

A free AI image dataset, removed for child sex abuse images, has come under fire before

A free AI image dataset, removed for child sex abuse images, has come under fire before

venturebeat.com

Researchers found child abuse material in the largest AI image generation dataset

Researchers found child abuse material in the largest AI image generation dataset

engadget.com

Stable Diffusion 1.5 Was Trained On Illegal Child Sexual Abuse Material, Stanford Study Says

Stable Diffusion 1.5 Was Trained On Illegal Child Sexual Abuse Material, Stanford Study Says

forbes.com

AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B

AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B

techtimes.com

Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.

Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.

vice.com

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images

gizmodo.com

Large AI training data set removed after study finds child abuse material

Large AI training data set removed after study finds child abuse material

cointelegraph.com

Abuse material found in openly accessible data set

Abuse material found in openly accessible data set

cybernews.com

Major Error Found in Stable Diffusion’s Biggest Training Dataset

Major Error Found in Stable Diffusion’s Biggest Training Dataset

analyticsvidhya.com

LAION and the Challenges of Preventing AI-Generated CSAM

LAION and the Challenges of Preventing AI-Generated CSAM

techpolicy.press

LAION-5B, Stable Diffusion 1.5, and the Original Sin of Generative AI

LAION-5B, Stable Diffusion 1.5, and the Original Sin of Generative AI

techpolicy.press

Was an AI Image Generator Taken Down for Making Child Porn?

Was an AI Image Generator Taken Down for Making Child Porn?

spectrum.ieee.org

Child abuse images removed from AI image-generator training source, researchers say

Child abuse images removed from AI image-generator training source, researchers say

apnews.com

Safety Review for LAION 5B
laion.ai · 2023
LAION.ai post-incident response

There have been reports in the press about the results of a research project at Stanford University, according to which the LAION training set 5B contains potentially illegal content in the form of CSAM. We would like to comment on this as …

Investigation Finds AI Image Generation Models Trained on Child Abuse
cyber.fsi.stanford.edu · 2023

A Stanford Internet Observatory (SIO) investigation identified hundreds of known images of child sexual abuse material (CSAM) in an open dataset used to train popular AI text-to-image generation models, such as Stable Diffusion.

A previous …

AI image training dataset found to include child sexual abuse imagery
theverge.com · 2023

A popular training dataset for AI image generation contained links to child abuse imagery, Stanford’s Internet Observatory found, potentially allowing AI models to create harmful content.  

LAION-5B, a dataset used by Stable Diffusion creat…

Study uncovers presence of CSAM in popular AI training dataset
theregister.com · 2023

A massive public dataset that served as training data for a number of AI image generators has been found to contain thousands of instances of child sexual abuse material (CSAM).

In a study published today, the Stanford Internet Observatory …

Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material
404media.co · 2023

This piece is published with support from The Capitol Forum.

The LAION-5B machine learning dataset used by Stable Diffusion and other major AI products has been removed by the organization that created it after a Stanford study found that i…

A free AI image dataset, removed for child sex abuse images, has come under fire before
venturebeat.com · 2023

A massive open-source AI dataset, LAION-5B, which has been used to train popular AI text-to-image generators like Stable Diffusion 1.5 and Google's Imagen, contains at least 1,008 instances of child sexual abuse material, a new report from …

Researchers found child abuse material in the largest AI image generation dataset
engadget.com · 2023

Researchers from the Stanford Internet Observatory say that a dataset used to train AI image generation tools contains at least 1,008 validated instances of child sexual abuse material. The Stanford researchers note that the presence of CSA…

Stable Diffusion 1.5 Was Trained On Illegal Child Sexual Abuse Material, Stanford Study Says
forbes.com · 2023

Stable Diffusion, one of the most popular text-to-image generative AI tools on the market from the $1 billion startup Stability AI, was trained on a trove of illegal child sexual abuse material, according to new research from the Stanford I…

AI Training Data Contains Child Sexual Abuse Images, Discovery Points to LAION-5B
techtimes.com · 2023

There have been significant problems with AI's training data, with various complaints already filed by those who claimed their work was stolen, but the most recent discovery saw child sexual abuse images in their dataset. In a recent study,…

Child Sex Abuse Material Was Found In a Major AI Dataset. Researchers Aren’t Surprised.
vice.com · 2023

Over 1,000 images of sexually abused children have been discovered inside the largest dataset used to train image-generating AI, shocking everyone except for the people who have warned about this exact sort of thing for years.

The dataset w…

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images
gizmodo.com · 2023

An influential machine learning dataset—the likes of which has been used to train numerous popular image-generation applications—includes thousands of suspected images of child sexual abuse, a new academic report reveals.

The report, put to…

Large AI training data set removed after study finds child abuse material
cointelegraph.com · 2023

A widely-used artificial intelligence data set used to train Stable Diffusion, Imagen and other AI image generator models has been removed by its creator after a study found it contained thousands of instances of suspected child sexual abus…

Abuse material found in openly accessible data set
cybernews.com · 2023

Child sexual abuse material (CSAM) has been located in LAION, a major data set used to train AI.

The Stanford Internet Observatory revealed thousands of images of child sexual abuse in the LAION-5B data set, which supports many different AI…

Major Error Found in Stable Diffusion’s Biggest Training Dataset
analyticsvidhya.com · 2023

The integrity of a major AI image training dataset, LAION-5B, utilized by influential AI models like Stable Diffusion, has been compromised after the discovery of thousands of links to Child Sexual Abuse Material (CSAM). This revelation has…

LAION and the Challenges of Preventing AI-Generated CSAM
techpolicy.press · 2024

Generative AI has been democratized. The toolkits to download, set up, use, and fine-tune a variety of models have been turned into one-click frameworks for anyone with a laptop to use. While this technology allows users to generate and exp…

LAION-5B, Stable Diffusion 1.5, and the Original Sin of Generative AI
techpolicy.press · 2024

In The Ones Who Walk Away From Omelas, the fiction writer Ursula K. Le Guin describes a fantastic city wherein technological advancement has ensured a life of abundance for all who live there. Hidden beneath the city, where nobody needs to …

Was an AI Image Generator Taken Down for Making Child Porn?
spectrum.ieee.org · 2024
David Evan Harris, Dave Willner post-incident response

Why are AI companies valued in the millions and billions of dollars creating and distributing tools that can make AI-generated child sexual abuse material (CSAM)?

An image generator called Stable Diffusion version 1.5, which was created by …

Child abuse images removed from AI image-generator training source, researchers say
apnews.com · 2024

Artificial intelligence researchers said Friday they have deleted more than 2,000 web links to suspected child sexual abuse imagery from a dataset used to train popular AI image-generator tools.

The LAION research dataset is a huge index of…

Variants

A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

DALL-E 2 Reported for Gender and Racially Biased Outputs

DALL-E 2 Reported for Gender and Racially Biased Outputs

Apr 2022 · 3 reports
Sexist and Racist Google Adsense Advertisements

Sexist and Racist Google Adsense Advertisements

Jan 2013 · 27 reports
Facebook’s Political Ad Detection Reportedly Showed High and Geographically Uneven Error Rates

Facebook’s Political Ad Detection Reportedly Showed High and Geographically Uneven Error Rates

Jul 2020 · 5 reports
Previous IncidentNext Incident

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

DALL-E 2 Reported for Gender and Racially Biased Outputs

DALL-E 2 Reported for Gender and Racially Biased Outputs

Apr 2022 · 3 reports
Sexist and Racist Google Adsense Advertisements

Sexist and Racist Google Adsense Advertisements

Jan 2013 · 27 reports
Facebook’s Political Ad Detection Reportedly Showed High and Geographically Uneven Error Rates

Facebook’s Political Ad Detection Reportedly Showed High and Geographically Uneven Error Rates

Jul 2020 · 5 reports

Research

  • Defining an “AI Incident”
  • Defining an “AI Incident Response”
  • Database Roadmap
  • Related Work
  • Download Complete Database

Project and Community

  • About
  • Contact and Follow
  • Apps and Summaries
  • Editor’s Guide

Incidents

  • All Incidents in List Form
  • Flagged Incidents
  • Submission Queue
  • Classifications View
  • Taxonomies

2023 - AI Incident Database

  • Terms of use
  • Privacy Policy
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • 9d70fba