OpenAI has been sued by a YouTuber whose videos were transcribed and used to train its AI system, opening a new front in the legal battle against companies leading development of the technology.
With the lawsuit, creators on YouTube join sprawling litigation over the unauthorized utilization of copyrighted material to power ChatGPT. Creators whove initiated legal action against AI firms include artists, authors, news publishers and record labels.
The complaint brought by David Millette on Friday in federal court in San Francisco builds off a report from The New York Times published in April over OpenAIs creation of a speech recognition system called Whisper. Faced with a supply problem in late 2021 after exhausting nearly every reservoir of text on the internet, the Sam Altman-led company allegedly built the tool to transcribe audio from YouTube videos, with the aim of training the next version of GPT. According to the complaint, OpenAI used Whisper to transcribe from more than one million hours of video from YouTube in violation of its terms of service, which bars people from using its content for independent applications and accessing services by automated means (such as robots, botnets or scrapers). Greg Brockman, president and one of the 11 cofounders on the company (who has also taken a leave of absence), is listed as a creator of Whisper in a research paper.
OpenAIs Language Models datasets include transcriptions of videos taken directly from YouTube, because these video transcriptions are one of the largest corpora of natural language data available for training and fine-tuning the OpenAI Language Models, states the complaint.
Some Google employees were aware that OpenAI harvested YouTube videos for training data but didnt take action since the Alphabet-owned company was doing the same to develop its own AI system, according to the Times report. If Google called out OpenAI for possibly violating the copyrights of YouTube creators, it could face similar blowback, the report said citing people with knowledge of the situation.
Notably, Millette doesnt bring a claim for copyright infringement and only alleges unjust enrichment and unfair competition over the utilization of video transcripts without consent or compensation. He seeks at least $5 million and a court order blocking OpenAI from further using his content.
A federal judge overseeing a lawsuit from top authors against OpenAI on July 30 dismissed a claim accusing the company of violating Californias unfair competition law, the same claim advanced by Millette. U.S. District Judge Araceli Martnez-Olgun found that federal law bars the claim since it relates to material within the subject matter of copyright, though she grounded some of her reasoning in the fact that it overlaps with a claim for direct copyright infringement, which wasnt alleged in the class action seeking to represent YouTubers.