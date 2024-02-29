By Alison Frankel

Feb 29 (Reuters) - In the plethora of copyright lawsuits against artificial intelligence developers, a pair of complaints filed on Wednesday against OpenAI and related defendants stands out.

Unlike most of the authors, artists and news organizations that have sued AI developers, The Intercept Media and Raw Story Media are not alleging straightforward copyright infringement claims. The media companies are instead asserting only that OpenAI and its co-defendants violated the Digital Millennium Copyright Act, or DMCA, deliberately undermining their copyrights by stripping identifying information out of articles used to train the AI system behind the popular chatbot ChatGPT.

As my Reuters colleague Blake Brittain reported on Wednesday, the 1998 federal DMCA statute prohibits the removal of information that can help copyright holders detect infringement, including article titles, author names and copyright dates.

The new Intercept and Raw Story lawsuits posit that OpenAI intentionally stripped that information out of training materials so that ChatGPT’s responses to user prompts would create the impression that “it is an all-knowing, ‘intelligent’ source of the information being provided, when in reality, the responses are frequently based on copyrighted works of journalism that ChatGPT simply mimics.”

In a press release that accompanied the new lawsuits, plaintiffs' lawyers from Loevy & Loevy said their DMCA theory offers an alternative path to recovery that does not require news organizations to shell out money to register their copyrights.

But the theory also allows plaintiffs to circumvent a primary defense argument in copyright cases against AI developers: that AI training materials make fair use of copyrighted content to create new products that do more than simply recreate the protected works.

Fair use, as Brittain has reported for Reuters, has been a centerpiece of AI defense arguments. You can see as much from the brief OpenAI filed this week in a copyright infringement case brought by The New York Times. OpenAI’s lawyers at Morrison & Foerster called for the case to be tossed, arguing that fair use principles will ultimately vindicate AI training with copyrighted materials because “it is perfectly lawful to use copyrighted content as part of a technological process that (as here) results in the creation of new, different, and innovative products.”

Or, as OpenAI argued flatly in a 2023 motion to dismiss most of a San Francisco federal court case by authors including the comedian and actress Sarah Silverman: “Fair use is not infringement.”

By claiming DMCA violations rather than traditional infringement, The Intercept and Raw Story likely won’t face those arguments, according to their counsel. “We believe the fair use defense would not apply to our cases,” said plaintiffs' lawyer Jon Loevy.

That's the good news for Loevy and his clients. The bad news: OpenAI and other defendants have lots of other arguments to combat DMCA allegations.

Neither OpenAI nor Morrison & Foerster responded to my query about the latest lawsuits, which were filed in Manhattan federal court. Microsoft MSFT.O, which is named as defendant in The Intercept’s complaint, also did not respond to a request for comment. But, as I’ll explain, OpenAI and other artificial intelligence developers have previously addressed DMCA allegations in other cases – and have persuaded judges in two California suits to curtail the claims.

The Intercept and Raw Story, it turns out, are not the only plaintiffs to assert DMCA claims against AI developers, although they appear to be the first to allege only DMCA violations. The New York Times, the authors who sued OpenAI in San Francisco federal court, and three artists who sued image generator Stability AI all included DMCA allegations in their lawsuits, albeit as secondary to their copyright infringement claims. (Interestingly, a separate group of authors who sued OpenAI in Manhattan federal court did not bring DMCA claims in their Feb. 5 consolidated complaint.)

AI defendants have primarily argued in motions to dismiss DMCA claims that plaintiffs cannot show AI developers deliberately stripped copyright-identifying information from the content that trained their models. The statute requires plaintiffs to prove that defendants intentionally deleted identifying information in order to make it harder for copyright holders to track infringement. (The classic example is an internet site that crops a photographer’s name and copyright from a misappropriated image.) OpenAI and other defendants contend they did not use such subterfuge in their training models.

They’ve also argued that plaintiffs can’t show they intended to omit copyright-identifying information from their output. In this week’s motion to dismiss The New York Times suit, for instance, OpenAI said ChatGPT, at worst, reproduces excerpts or snippets from Times stories. It does not reproduce entire stories shorn only of identifying copyright information, OpenAI said – which is the conduct Congress had in mind when it enacted the DMCA.

Out of curiosity, I asked ChatGPT to write a brief moving for the dismissal of DMCA claims. The chatbot larded its response with warnings that it could not provide legal advice but suggested arguments that plaintiffs did "not provide specific details regarding the alleged removal" of copyright-identifying information and that any removal of such information "was unintentional and without malice."

OpenAI noted in its dismissal brief in the Times case that two federal judges have already endorsed AI defendants’ DMCA arguments. Last October, U.S. District Judge William Orrick of San Francisco ruled that the artists who sued Stability AI failed to allege the removal of copyright identifying information from any specific works used to train the AI model. It was not enough, Orrick said, to claim wholesale alteration of images to delete information such as watermarks and artists' signatures. Plaintiffs, he said, had to allege specific alterations of particular works.

Similarly, U.S. District Judge Araceli Martinez-Olguin of San Francisco dismissed authors’ DMCA claims against OpenAI earlier this month, holding that plaintiffs failed to show that defendants systematically deleted copyright-identifying information from training content.

Martinez-Olguin said the premise was undermined by ChatGPT’s output, which included references to authors’ names and book titles. The judge also said it was significant, in the context of DMCA claims, that the authors’ books were not reproduced in full in ChatGPT responses.

Journalists who claim that ChatGPT essentially plagiarizes their work in its entirety presumably won’t have that problem, but precedent suggests The Intercept and Raw Story will have to come up with plausible allegations that OpenAI removed or altered information from training content specifically to hide its reliance on copyrighted material.

That won’t be easy.

(Reporting By Alison Frankel)

