Two Courts Rule On Generative AI and Fair Use

Two Courts Rule On Generative AI and Fair Use — One Gets It Right

www.eff.org

Things are speeding up in generative AI legal cases, with two judicial opinions just out on an issue that will shape the future of generative AI: whether training gen-AI models on copyrighted works is fair use. One gets it spot on; the other, not so much, but fortunately in a way that future courts can and should discount.

The core question in both cases was whether using copyrighted works to train Large Language Models (LLMs) used in AI chatbots is a lawful fair use. Under the US Copyright Act, answering that question requires courts to consider:

whether the use was transformative;

the nature of the works (Are they more creative than factual? Long since published?)

how much of the original was used; and

the harm to the market for the original work.

In both cases, the judges focused on factors (1) and (4).

The right approach

In Bartz v. Anthropic, three authors sued Anthropic for using their books to train its Claude chatbot. In his order deciding parts of the case, Judge William Alsup confirmed what EFF has said for years: fair use protects the use of copyrighted works for training because, among other things, training gen-AI is “transformative—spectacularly so” and any alleged harm to the market for the original is pure speculation. Just as copying books or images to create search engines is fair, the court held, copying books to create a new, “transformative” LLM and related technologies is also protected:

[U]sing copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use.

Importantly, Bartz rejected the copyright holders’ attempts to claim that any model capable of generating new written material that might compete with existing works by emulating their “sweeping themes, “substantive points,” or “grammar, composition, and style” was an infringement machine. As the court rightly recognized, building gen-AI models that create new works is beyond “anything that any copyright owner rightly could expect to control.”

There’s a lot more to like about the Bartz ruling, but just as we were digesting it Kadrey v. Meta Platforms came out. Sadly, this decision bungles the fair use analysis.

A fumble on fair use

Kadrey is another suit by authors against the developer of an AI model, in this case Meta’s ‘Llama’ chatbot. The authors in Kadrey asked the court to rule that fair use did not apply.

Much of the Kadrey ruling by Judge Vince Chhabria is dicta—meaning, the opinion spends many paragraphs on what it thinks could justify ruling in favor of the author plaintiffs, if only they had managed to present different facts (rather than pure speculation). The court then rules in Meta’s favor because the plaintiffs only offered speculation.

But it makes a number of errors along the way to the right outcome. At the top, the ruling broadly proclaims that training AI without buying a license to use each and every piece of copyrighted training material will be “illegal” in “most cases.” The court asserted that fair use usually won’t apply to AI training uses even though training is a “highly transformative” process, because of hypothetical “market dilution” scenarios where competition from AI-generated works could reduce the value of the books used to train the AI model..

That theory, in turn, depends on three mistaken premises. First, that the most important factor for determining fair use is whether the use might cause market harm. That’s not correct. Since its seminal 1994 opinion in Cambell v Acuff-Rose, the Supreme Court has been very clear that no single factor controls the fair use analysis.

Second, that an AI developer would typically seek to train a model entirely on a certain type of work, and then use that model to generate new works in the exact same genre, which would then compete with the works on which it was trained, such that the market for the original works is harmed. As the Kadrey ruling notes, there was no evidence that Llama was intended to to, or does, anything like that, nor will most LLMs for the exact reasons discussed in Bartz.

Third, as a matter of law, copyright doesn't prevent “market dilution” unless the new works are otherwise infringing. In fact, the whole purpose of copyright is to be an engine for new expression. If that new expression competes with existing works, that’s a feature, not a bug.

Gen-AI is spurring the kind of tech panics we’ve seen before; then, as now, thoughtful fair use opinions helped ensure that copyright law served innovation and creativity. Gen-AI does raise a host of other serious concerns about fair labor practices and misinformation, but copyright wasn’t designed to address those problems. Trying to force copyright law to play those roles only hurts important and legal uses of this technology.

In keeping with that tradition, courts deciding fair use in other AI copyright cases should look to Bartz, not Kadrey.