Writers

Incidents Harmed By

Incident 9974 Report
Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

2023-02-28

Court records reveal that Meta employees allegedly discussed pirating books to train LLaMA 3, citing cost and speed concerns with licensing. Internal messages suggest Meta accessed LibGen, a repository of over 7.5 million pirated books, with apparent approval from Mark Zuckerberg. Employees allegedly took steps to obscure the dataset’s origins. OpenAI has also been implicated in using LibGen.

Incident 9963 Report
Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

2020-10-25

Meta and Bloomberg allegedly used Books3, a dataset containing 191,000 pirated books, to train their AI models, including LLaMA and BloombergGPT, without author consent. Lawsuits from authors such as Sarah Silverman and Michael Chabon claim this constitutes copyright infringement. Books3 includes works from major publishers like Penguin Random House and HarperCollins. Meta argues its AI outputs are not "substantially similar" to the original books, but legal challenges continue.

Incident 9952 Report
The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

2023-12-27

The New York Times alleges that OpenAI and Microsoft used millions of its articles without permission to train AI models, including ChatGPT. The lawsuit claims the companies scraped and reproduced copyrighted content without compensation, in turn undermining the Times’s business and competing with its journalism. Some AI outputs allegedly regurgitate Times articles verbatim. The lawsuit seeks damages and demands the destruction of AI models trained on its content.

Writers

Incidents Harmed By

Incident 9974 ReportMeta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

Incident 9963 ReportMeta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Incident 9952 ReportThe New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

Related Entities Related EntitiesOther entities that are related to the same incident. For example, if the developer of an incident is this entity but the deployer is another entity, they are marked as related entities.

Related Entities

OpenAI

Incidents involved as both Developer and Deployer

Meta

Incidents involved as both Developer and Deployer

publishers

Incidents Harmed By

Journalists

Incidents Harmed By

Authors

Incidents Harmed By

Academic researchers

Incidents Harmed By

OpenAI models

Incidents implicated systems

Llama 3

Incidents implicated systems

Library Genesis (LibGen)

Incidents implicated systems

GPT-4

Incidents implicated systems

BitTorrent

Incidents implicated systems

Microsoft

Incidents involved as both Developer and Deployer

The New York Times

Incidents Harmed By

Journalism

Incidents Harmed By

Media organizations

Incidents Harmed By

ChatGPT

Incidents implicated systems

Microsoft Bing Chat

Incidents implicated systems

Various generative AI developers

Incidents involved as both Developer and Deployer

EleutherAI

Incidents involved as both Developer and Deployer

Bloomberg

Incidents involved as both Developer and Deployer

The Pile

Incidents involved as Developer

Incidents implicated systems

Shawn Presser

Incidents involved as Developer

Zadie Smith

Incidents Harmed By

Verso

Incidents Harmed By

Stephen King

Incidents Harmed By

Sarah Silverman

Incidents Harmed By

Richard Kadrey

Incidents Harmed By

Publishers found in Books3

Incidents Harmed By

Penguin Random House

Incidents Harmed By

Oxford University Press

Incidents Harmed By

Over 170,000 authors found in Books3

Incidents Harmed By

Michael Pollan

Incidents Harmed By

Margaret Atwood

Incidents Harmed By

Macmillan

Incidents Harmed By

HarperCollins

Incidents Harmed By

General public

Incidents Harmed By

Incident 9974 Report
Meta and OpenAI Accused of Using LibGen’s Pirated Books to Train AI Models

Incident 9963 Report
Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Incident 9952 Report
The New York Times Sues OpenAI and Microsoft Over Alleged Unauthorized AI Training on Its Content

Related Entities