
Meta Platforms, previously known as Facebook, faces mounting legal turmoil following allegations of utilising thousands of pirated books to train its AI models, despite warnings from its legal team. The controversy, detailed in a recent court filing linked to a copyright infringement lawsuit, unveils a contentious battle between prominent authors and the tech giant.
Comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon, among others, have united against Meta, asserting that their works were unlawfully employed by the company to train its artificial-intelligence language model, Llama. The latest legal submission consolidates these claims, highlighting Meta's alleged disregard for copyright permissions in its pursuit of advancing AI technology.
The filing presents chat logs from a Meta-affiliated researcher discussing the acquisition of the dataset in a Discord server. These logs serve as potential evidence indicating Meta's awareness of potential legal infringement related to the usage of the book files.
As per a Reuters report, the conversation, cited in the complaint, showcases a back-and-forth dialogue between researcher Tim Dettmers and Meta's legal department, where concerns about the legality of using the book files for training purposes were raised. Dettmers' communications reveal internal debates within Meta regarding the permissibility of employing the dataset and highlight the company's apparent acknowledgement of legal uncertainties surrounding the matter.
While the specifics of the lawyers' concerns remain undisclosed, references to "books with active copyrights" emerge as a primary source of apprehension. Participants in the chat suggest that training on such data could potentially infringe upon fair use, a legal doctrine protecting specific unlicensed uses of copyrighted works.
Meta's release of the Llama large language model, purportedly trained on the controversial dataset, earlier this year has stirred uproar within the content creator community. With tech companies facing an onslaught of lawsuits alleging unauthorized use of copyrighted material to fuel AI advancements, the outcome of these legal battles could significantly impact the future landscape of generative AI.
In February, Meta unveiled the first version of its Llama large language model, accompanied by a roster of datasets employed during its training phase. This encompassed the inclusion of "the Books3 section of ThePile," a dataset reportedly comprising 196,640 books, as confirmed by claims made in the legal filing. However, Meta refrained from divulging the specifics of the training data employed for its latest rendition, Llama 2, which became commercially accessible during the summer months. It is available for usage by enterprises having fewer than 700 million monthly active users without any charge.
(With Reuters input)
Also read Instagram brings new updates to reels, feed photos to create content easily; see details
For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine
Copyright©2025 Living Media India Limited. For reprint rights: Syndications Today