Skip to Content
logologo
AI Incident Database
Open TwitterOpen RSS FeedOpen FacebookOpen LinkedInOpen GitHub
Open Menu
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Discover
Submit
  • Welcome to the AIID
  • Discover Incidents
  • Spatial View
  • Table View
  • List view
  • Entities
  • Taxonomies
  • Submit Incident Reports
  • Submission Leaderboard
  • Blog
  • AI News Digest
  • Risk Checklists
  • Random Incident
  • Sign Up
Collapse
Entities

Writers

Incidents Harmed By

Incident 9974 Report
Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

2023-02-28

Court records reveal that Meta employees allegedly discussed pirating books to train LLaMA 3, citing cost and speed concerns with licensing. Internal messages suggest Meta accessed LibGen, a repository of over 7.5 million pirated books, with apparent approval from Mark Zuckerberg. Employees allegedly took steps to obscure the dataset’s origins. OpenAI has also been implicated in using LibGen.

More

Incident 9952 Report
The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

2023-12-27

The New York Times alleges that OpenAI and Microsoft used millions of its articles without permission to train AI models, including ChatGPT. The lawsuit claims the companies scraped and reproduced copyrighted content without compensation, in turn undermining the Times’s business and competing with its journalism. Some AI outputs allegedly regurgitate Times articles verbatim. The lawsuit seeks damages and demands the destruction of AI models trained on its content.

More

Incident 9962 Report
Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

2020-10-25

Meta and Bloomberg allegedly used Books3, a dataset containing 191,000 pirated books, to train their AI models, including LLaMA and BloombergGPT, without author consent. Lawsuits from authors such as Sarah Silverman and Michael Chabon claim this constitutes copyright infringement. Books3 includes works from major publishers like Penguin Random House and HarperCollins. Meta argues its AI outputs are not "substantially similar" to the original books, but legal challenges continue.

More

Related Entities
Other entities that are related to the same incident. For example, if the developer of an incident is this entity but the deployer is another entity, they are marked as related entities.
 

Entity

OpenAI

Incidents involved as both Developer and Deployer
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

Microsoft

Incidents involved as both Developer and Deployer
  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

The New York Times

Incidents Harmed By
  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

Journalists

Incidents Harmed By
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

Journalism

Incidents Harmed By
  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

Media organizations

Incidents Harmed By
  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

publishers

Incidents Harmed By
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

ChatGPT

Incidents implicated systems
  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

GPT-4

Incidents implicated systems
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

Microsoft Bing Chat

Incidents implicated systems
  • Incident 995
    2 Reports

    The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

More
Entity

Various generative AI developers

Incidents involved as both Developer and Deployer
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Meta

Incidents involved as both Developer and Deployer
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

EleutherAI

Incidents involved as both Developer and Deployer
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Bloomberg

Incidents involved as both Developer and Deployer
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

The Pile

Incidents involved as Developer
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Shawn Presser

Incidents involved as Developer
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Zadie Smith

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Verso

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Stephen King

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Sarah Silverman

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Richard Kadrey

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Publishers found in Books3

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Penguin Random House

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Oxford University Press

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Over 170,000 authors found in Books3

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Michael Pollan

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Margaret Atwood

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Macmillan

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

HarperCollins

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

General public

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Creative industries

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Christopher Golden

Incidents Harmed By
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Authors

Incidents Harmed By
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

LLaMA

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

hugging face

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

GPT-J

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Books3

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

BloombergGPT

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Bibliotik

Incidents implicated systems
  • Incident 996
    2 Reports

    Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

More
Entity

Academic researchers

Incidents Harmed By
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

More
Entity

OpenAI models

Incidents implicated systems
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

More
Entity

Llama 3

Incidents implicated systems
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

More
Entity

Library Genesis (LibGen)

Incidents implicated systems
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

More
Entity

BitTorrent

Incidents implicated systems
  • Incident 997
    4 Reports

    Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

More

Research

  • Defining an “AI Incident”
  • Defining an “AI Incident Response”
  • Database Roadmap
  • Related Work
  • Download Complete Database

Project and Community

  • About
  • Contact and Follow
  • Apps and Summaries
  • Editor’s Guide

Incidents

  • All Incidents in List Form
  • Flagged Incidents
  • Submission Queue
  • Classifications View
  • Taxonomies

2023 - AI Incident Database

  • Terms of use
  • Privacy Policy
  • Open twitterOpen githubOpen rssOpen facebookOpen linkedin
  • 30ebe76