ARTIFICIAL INTELLIGENCE AND COPYRIGHT LAW: NAVIGATING UNCHARTED TERRITORY

Admin

10 hours ago

Author : Mahak Chatkele , a student of Rabindranath Tagore university

Introduction

The rapid advancement of artificial intelligence has thrust copyright law into one of its most consequential reckonings in decades. Generative AI systems, capable of producing text, images, music, and code that rival human creativity, have exposed fundamental gaps in legal frameworks designed for a pre-digital era. These systems are trained on vast datasets scraped from the internet, often comprising millions of copyrighted works, raising urgent questions about whether such use constitutes infringement or falls within the bounds of fair use and similar doctrines worldwide.

At the heart of this debate lie three interconnected issues. First, the question of input: does feeding copyrighted material into a training dataset without the creator’s consent violate exclusive rights, or is it a transformative act akin to how human artists learn from existing works? Second, the question of output: when an AI system generates content that closely resembles a protected work, who bears responsibility, and does the output itself qualify for copyright protection at all? Third, and perhaps most philosophically challenging, is the question of authorship: can a machine be considered a creator under laws that have historically presumed human agency as a prerequisite for protection?

Courts, legislators, and regulatory bodies across jurisdictions are grappling with these questions in real time, often reaching inconsistent conclusions that reflect deeper disagreements about the purpose of copyright itself. Is the law meant primarily to protect creators’ economic interests, or to foster the broadest possible dissemination of knowledge and innovation? This article examines the evolving legal landscape surrounding AI and copyright, analyzing key litigation, legislative responses, and doctrinal tensions that will shape how creative and intellectual property rights are defined in an age increasingly shaped by machine-generated content.

Abstract

The emergence of generative artificial intelligence has created significant tension within established copyright frameworks, which were never designed to address non-human creation or large-scale automated use of protected works. This article critically examines the intersection of AI technology and copyright law, focusing on three core dilemmas: the legality of using copyrighted material to train AI models without explicit authorization, the copyrightability of AI-generated outputs, and the broader question of whether machines can qualify as authors under existing legal definitions. Drawing on recent litigation, legislative developments, and comparative analysis across major jurisdictions, the article evaluates how courts and policymakers are attempting to reconcile innovation with creator protection. Particular attention is given to the doctrine of fair use and its international equivalents, the economic implications for artists and content creators, and the risk of AI systems reproducing or closely mimicking protected expression. The analysis reveals a fragmented global response, with divergent approaches emerging in the United States, European Union, and other key markets. The article concludes by proposing a framework for balancing technological progress with the rights of original creators, arguing that meaningful reform must address both the input and output stages of AI-generated content.

AI and Copyright: The Core Legal Challenges

Training Data and the Input Problem

Generative AI models learn by processing enormous quantities of text, images, and other creative works, the vast majority of which are copyrighted. Developers typically argue this process qualifies as fair use, comparing it to how human creators absorb influences from existing works. Critics counter that machine learning operates at a scale and speed no human could replicate, and that copying entire works into a training dataset—even temporarily—constitutes unauthorized reproduction. Several ongoing lawsuits in the United States, including cases brought by authors, visual artists, and news organizations, are testing this argument directly. Courts must now decide whether transformation happens at the training stage, the output stage, or not at all.

Output and Substantial Similarity

A separate but related concern involves AI-generated outputs that closely resemble existing copyrighted works. When a model produces an image, song, or passage that mirrors protected material, traditional infringement analysis—centered on substantial similarity—becomes difficult to apply. Unlike human infringement, where intent and access can be examined, AI outputs result from probabilistic pattern-matching across millions of data points. This raises unresolved questions about liability: should responsibility fall on the AI developer, the platform deploying the model, or the end user who generated the prompt?

Authorship and the Human Requirement

Most copyright systems, including U.S. law, require human authorship for a work to qualify for protection. The U.S. Copyright Office has repeatedly denied registration to works generated entirely by AI without meaningful human creative input. This position raises practical difficulties as AI becomes further integrated into creative workflows, blurring the line between human-authored and machine-generated content. Jurisdictions differ in their approach: the United Kingdom, for instance, extends limited protection to computer-generated works, while most other countries maintain a strict human-authorship standard.

A Fragmented Global Response

No unified international standard currently governs AI and copyright. The European Union has introduced transparency obligations under the AI Act, requiring disclosure of training data sources, while the United States relies on evolving case law rather than comprehensive legislation. This regulatory fragmentation creates uncertainty for AI developers operating across borders and leaves creators without consistent protection. As litigation progresses and lawmakers respond, the coming years will likely determine whether copyright law adapts incrementally or requires fundamental restructuring to address AI’s unique challenges.

Case Law and Evidentiary Analysis

The legal debate surrounding AI and copyright is no longer theoretical; it is actively being tested through litigation across multiple jurisdictions. In the United States, The New York Times v. OpenAI and Microsoft stands as one of the most closely watched cases, with the publisher alleging that millions of its articles were used without authorization to train language models, and that the resulting outputs sometimes reproduced substantial portions of copyrighted content nearly verbatim. Similarly, Getty Images v. Stability AI addresses whether training an image-generation model on licensed stock photography without permission constitutes infringement, particularly where outputs retained visible watermarks traceable to Getty’s database.

In the music and visual arts sectors, cases brought by artists against AI image generators like Midjourney and Stable Diffusion argue that these tools were trained on copyrighted artwork scraped without consent, enabling the models to mimic distinctive artistic styles. These disputes test whether style itself, though traditionally unprotected, becomes legally significant when replicated through systematic technological means.

Regulatory evidence further supports the argument for reform. The U.S. Copyright Office’s 2023 guidance denying registration to purely AI-generated works, combined with its ongoing study on generative AI, signals institutional recognition that existing frameworks are insufficient. Meanwhile, the EU AI Act’s transparency requirements, mandating disclosure of copyrighted training data, reflect a legislative acknowledgment that opacity in AI development undermines creators’ ability to enforce their rights.

Collectively, this body of litigation and regulatory action demonstrates that courts and lawmakers are actively grappling with unresolved questions of liability, fair use, and authorship. The outcomes of these cases will likely establish foundational precedents shaping how copyright law accommodates AI technology going forward.

Legal Doctrine and Terminology

Any rigorous analysis of AI and copyright necessitates engagement with established legal doctrine. The concept of fair use, codified under Section 107 of the U.S. Copyright Act, requires courts to weigh four statutory factors: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the potential market. AI developers frequently invoke the transformative use doctrine, arguing that training constitutes a non-expressive, intermediate use distinct from the original work’s purpose.

Plaintiffs, conversely, raise claims of direct infringement and contributory infringement, alleging unauthorized reproduction and distribution under exclusive rights granted to copyright holders. The doctrine of substantial similarity remains central to establishing infringement of AI outputs, while de minimis copying may serve as a defense where reproduction is negligible.

Questions of authorship and originality—longstanding prerequisites for copyright protection—intersect with the work-for-hire doctrine when evaluating AI-assisted creations. Additionally, safe harbor provisions under the DMCA and principles of vicarious liability inform discussions of platform responsibility. Collectively, these doctrines form the analytical framework through which courts assess AI’s compatibility with copyright’s foundational principles.

Relevant Laws

United States

The primary framework is the Copyright Act of 1976, particularly Section 107 governing fair use. The U.S. Copyright Office’s 2023 Policy Statement clarified that works lacking meaningful human authorship cannot be registered, directly addressing AI-generated content. No dedicated federal AI-copyright legislation currently exists, leaving courts to interpret existing statutes.

European Union

The EU AI Act (2024) introduces transparency obligations, requiring providers of general-purpose AI models to disclose summaries of copyrighted training data used. This works alongside the Digital Single Market (DSM) Copyright Directive (2019) , particularly Article 4, which permits text and data mining but allows rights holders to opt out through machine-readable reservations.

United Kingdom

The UK’s Copyright, Designs and Patents Act 1988 uniquely grants limited protection to computer-generated works under Section 9(3), attributing authorship to “the person by whom the arrangements necessary for the creation of the work are undertaken.” The UK government has also conducted extensive consultations on introducing a text-and-data-mining exception for AI training.

India

India’s Copyright Act, 1957 requires human authorship, and the Copyright Office has not yet issued formal guidance on AI-generated works, leaving considerable regulatory ambiguity for creators and developers operating in this market.

Case Laws

Several landmark disputes are shaping this evolving area:

• The New York Times v. OpenAI and Microsoft (2023) : Alleges unauthorized use of copyrighted articles for training, with outputs allegedly reproducing content nearly verbatim.

• Getty Images v. Stability AI (UK & US, 2023) : Challenges the use of licensed stock photography in training image-generation models without consent.

• Andersen v. Stability AI (2023) : Visual artists allege that image generators were trained on copyrighted artwork without authorization, enabling style replication.

• Thaler v. Perlmutter (2023) : The D.C. Circuit affirmed that copyright protection requires human authorship, rejecting registration for a work created solely by an AI system.

• Authors Guild v. OpenAI (2023) : A consolidated action by novelists alleging systematic use of copyrighted books to train large language models.

These cases remain in various stages of litigation and will likely produce foundational precedents on training data use, output liability, and authorship standards.

Conclusion

The convergence of artificial intelligence and copyright law represents one of the defining legal challenges of this technological era. Existing frameworks, built around human creativity and traditional notions of reproduction, are being tested by systems capable of generating content at unprecedented scale and speed. The unresolved questions—whether training constitutes infringement, whether AI outputs can be copyrighted, and whether machines can be authors—do not have easy answers, and jurisdictions are responding with markedly different approaches. As litigation progresses and regulatory frameworks like the EU AI Act take effect, greater clarity may emerge. However, meaningful resolution will likely require legislative intervention rather than reliance on judicial interpretation alone. Balancing innovation with creators’ rights remains the central challenge, and how lawmakers navigate this balance will shape the creative and technological landscape for decades to come.

FAQs

Q1: Can AI-generated content be copyrighted?

Generally, no—most jurisdictions, including the U.S. and India, require human authorship. However, works with substantial human creative input combined with AI assistance may qualify for protection.

Q2: Is it legal to train AI models on copyrighted material?

This remains unsettled. Developers often claim fair use protection, while rights holders argue it constitutes unauthorized reproduction. Ongoing litigation will help clarify this issue.

Q3: Who is liable if an AI generates infringing content?

Liability could potentially fall on the developer, the deploying platform, or the end user, depending on jurisdiction and the specific circumstances of use.

Q4: Does the UK protect AI-generated works?

Yes, under Section 9(3) of the Copyright, Designs and Patents Act 1988, the UK grants limited protection, attributing authorship to the person who arranged the work’s creation.

Q5: What is the EU AI Act’s role in copyright?

It mandates transparency, requiring AI developers to disclose summaries of copyrighted material used in training general-purpose models.

Q6: How does fair use apply to AI training?

Courts assess four factors: purpose of use, nature of the original work, amount used, and market impact. Outcomes vary significantly based on case specifics.