The Intersection of Generative AI and Copyright Law: Navigating the Fair Use Doctrine in the Digital Age

Author: Prachi Talekar 

College: K.G Shah Law School, SNDT University

LinkedIn:-  https://www.linkedin.com/in/prachi-talekar-b97941240?utm_source=share_via&utm_content=profile&utm_medium=member_ios

 

 

Abstract

 

The article focuses on the dynamic field of generative AI and its interface with copyright laws. With the increasing training of the AI models on huge data sets that include copyrighted content, one of the major questions emerges: Is the scraping of intellectual property content without authorization for training AI considered copyright violation, or does it fall into the category of Fair Use? Through a thorough study of the current precedents in the courts and the technological aspect of the same, this article will focus on transformative use, human creator’s market, and future regulatory frameworks.

 

To the Point

 

The crux of the legal dispute lies in the illegal copying of copyrighted material for training the Large Language Models. The plaintiffs (authors/publishers) claim that this amounts to the theft of their intellectual property. On the other hand, the defendants (AI creators) say that the act of training is a transformative act where uncopyrightable statistics are extracted and put together to create something altogether new.

The answer to this question depends upon the application of the four factor fair use test and how well the output of the AI competes in the market with the original work.

 

Use of Legal Jargon

 

For a proper analysis of this case, some of the core legal principles need to be applied:

 

De Minimis Non Curat Lex: The Latin legal maxim that means “the law is not concerned with trifles.” AI creators can sometimes claim that adding a work into a database that consists of billions is de minimis.

 

Transformative Use: One of the four factors of the first fair use doctrine, established in the case of Campbell v. Acuff-Rose Music, Inc.

 

Prima Facie Infringement: Building a case based on the first impression, whereby the plaintiff has to prove ownership of a copyright and the reproduction of the protected work.

 

Derivative Work: It refers to the work based on one or more existing works. Creators believe that AI models or their results are derivative works

 

The Four-Factor Test (17 U.S.C. § 107):

 

Statutory approach for determining whether there is fair use involving:

 

1. Purpose and character of the use.

 

2. Nature of the copyrighted work.

 

3. Amount and substantiality of the portion used.

 

4. Effect of the use upon the potential market.

 

Case Laws 

 

1. Authors Guild, Inc. v. Google, Inc. (755 F.3d 211, 2d Cir. 2015)

Background: Google had digitized millions of copyrighted books to build an index and allow users to have access to “snippets.”

 

Decision: The court found that Google’s digitization constituted transformative fair use due to its entirely new communicative purpose (indexing and making things searchable) without serving as a market substitute for the original. This decision is widely cited by AI companies to support their data scraping practices.

 

2. Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith (598 U.S. 508, 2023)

 

Background: The Supreme Court considered the transformative nature of a copyrighted work used by Warhol to create a series of silk-screen prints, which were then licensed to magazines.

 

Decision: The Supreme Court sided with Goldsmith, significantly limiting the scope of what is considered transformative. The court noted that when the original and secondary works have substantially the same commercial purpose, the first factor of fair use does not favor the copier. This significantly weakens the AI developers’ broad “transformative” defense if AI outputs mimic or substitute the original creators’ market.

The Proof

The actual fact regarding AI learning proves that the AI models require the imitation of the expressive components of humans’ work. The process of pre-learning for the AI model involves downloading, processing, and copying of texts or images in the memory to perform calculations regarding mathematical weights:

 

In cases where \theta is the set of parameters being optimized through the tokenized expressions of humans (x_i). In addition to that, “memorization” problems have been observed in which the LLM produces paragraphs verbatim from copyrighted books and even reproduces unique watermarks from stock photograph databases upon being prompted.

 

Conclusion 

 

This legal debate will be instrumental in shaping the contours of Intellectual Property law. Although the technical process of training AI is literal copying, the judicial assessment of fair use is more than likely going to heavily depend on Warhol case and the fourth factor of fair use.

 

If AI models are used for producing content that directly competes with the artists from whom they stole their data in the first place, the likelihood of copyright infringement grows higher and higher with each passing day. From now on, the industry should move towards sustainable licensing schemes, voluntary data covenants, and proper legislative guidance.