Site icon Lawful Legal

Training Data vs. Fair Use: Resolving the Copyright Dilemma in Generative AI Under Indian Law

Author: KRITI BANSAL
College: Maharaja Agrasen Institute of Management Studies


LinkedIn Profile:https://www.linkedin.com/in/kriti-bansal-law

Abstract

The generative AI copyright dilemma in India reduces to a structural mismatch between the architecture of the Copyright Act, 1957 and the technical reality of machine learning. The Act was designed for human-scale copying—a student photocopying a chapter, a critic quoting a passage. It was not designed for the ingestion of billions of documents by an automated system for the purpose of statistical pattern extraction.

The closed-list structure of Section 52 means that no exception can be judicially stretched to cover commercial AI training without effectively rewriting the statute. Indian courts, unlike U.S. courts applying the open-ended four-factor fair use test, cannot engage in the kind of transformative use balancing that U.S. courts have applied in analogous disputes. India’s Section 52 is a statutory permission, not a judicial balancing exercise.

The DPIIT working paper acknowledges this gap and explores a hybrid licensing regime for “lawfully accessed” works, but this approach has been criticized for assuming infringement before Indian law has conclusively answered that question. The result is a state of legal uncertainty that simultaneously chills AI innovation and fails to adequately protect rights holders.

To the Point

The unauthorized ingestion of copyrighted works for training Generative AI models constitutes a prima facie infringement of the copyright owner’s exclusive right of reproduction under Section 14 of the Copyright Act, 1957.

India follows a restrictive, closed-list “fair dealing” approach under Section 52, unlike the open-ended “fair use” doctrine of the United States. Large-scale commercial AI training does not presently fit within any enumerated statutory exception under Indian law, rendering the practice legally precarious without explicit licensing.

As of June 2026, there is no settled Indian appellate ruling establishing that AI training constitutes fair dealing, and the DPIIT policy process has produced a working paper that itself remains contested.

Use of Legal Jargon

Prima Facie Infringement

Under Section 51 of the Copyright Act, 1957, copyright is infringed when any person, without a licence from the owner, does anything that the copyright owner has the exclusive right to do, or makes, sells, distributes, or publicly exhibits infringing copies. The algorithmic ingestion and tokenization of copyrighted text during AI training constitutes reproduction in a material form, triggering Section 51 liability.

Exclusive Rights Under Section 14

Section 14 grants the copyright owner a bundle of exclusive rights—reproduction, adaptation, communication to the public, and translation—and these rights are expressly subject to Section 52. Any act that falls outside Section 52 and involves these rights without a licence is an infringement.

Fair Dealing vs. Fair Use

Doctrine

Jurisdiction

Nature

Test

Fair Dealing

India (Section 52)

Closed, exhaustive statutory list

Purpose must fit an enumerated category

Fair Use

USA (Section 107)

Open-ended equitable balancing

Four-factor test: purpose, nature, amount, market effect

India’s Section 52 is not a balancing test but a statutory exception framework. Courts look at purpose, amount taken, and effect on the market as interpretive aids, but only within the confines of the enumerated categories.

Key Doctrines Relevant to AI Training

Idea-Expression Dichotomy: Copyright protects only the tangible expression of an idea, not the underlying idea, facts, or data itself. AI models arguably learn patterns and ideas, not expressions—but the ingestion process itself involves copying the expression.

Modicum of Creativity: The Indian threshold for originality requires a minimum degree of skill and judgment, rejecting the pure “sweat of the brow” doctrine. Even compilations and databases with minimal creative input may attract copyright protection.

Transient or Incidental Storage: Section 52(1)(b) exempts transient or incidental storage occurring purely in the technical process of electronic transmission or communication to the public. AI training ingestion involves deliberate, non-transient retention and therefore falls outside this exception.

Text and Data Mining (TDM) Exception: India currently has no dedicated TDM exception in Section 52, unlike the European Union.

The Proof

Section 14 — The Rights Infringed

The following rights are directly engaged by AI training:

● Reproduction Right: Copying text, images, or code into training datasets.

● Adaptation Right: Transforming works into tokenized or vectorized formats.

● Communication to the Public: Potentially engaged when AI outputs are distributed.

Section 52 — The Closed List of Exceptions

Section 52(1)(a) — Fair Dealing for Research or Private Study

The research exception is the most commonly invoked defense. However, commercial AI training by technology corporations is not “private study” or non-commercial research. The DPIIT working paper itself treats the existing Section 52 framework as insufficient for large-scale AI training.

Section 52(1)(b) — Transient or Incidental Storage

This exception covers only temporary buffer copies created purely in the technical process of electronic transmission. AI training ingestion is deliberate, durable, and purposive. The copy is retained for the entire duration of training and used to modify model parameters.

Section 52(1)(i) — Educational Exception

Section 52(1)(i) exempts acts done by a teacher or pupil in the course of instruction. Commercial AI training is not instruction in any cognizable sense.

Section 52(1)(aa) — Computer Programme Adaptations

This provision is limited to software adaptations and does not extend to the ingestion of literary, artistic, or musical works for AI training purposes.

The Infringement Mechanism

The AI training pipeline involves:

● Scraping: Mass reproduction of copyrighted works into training corpora.

● Tokenization and Vectorization: Transformation of works into numerical representations.

● Model Weight Encoding: The trained model may memorize and reproduce verbatim excerpts.

Each stage engages Section 14 rights without a corresponding Section 52 exception.

Case Laws

1. Eastern Book Company v. D.B. Modak (Supreme Court of India, 2007)

The Supreme Court held that copyright does not arise from labour alone; the work must contain a minimal degree of creativity and must not be merely trivial or mechanical. The Court rejected the pure “sweat of the brow” doctrine.

Application to AI Training: Even compilations, databases, and structured datasets involving minimal but real creative input attract copyright protection. Most curated datasets used in AI training will satisfy this threshold.

2. Chancellor, Masters and Scholars of the University of Oxford v. Narendra Publishing House (Delhi High Court)

The Delhi High Court held that guide books reproducing mathematical questions and providing solutions were protected under fair dealing as a review because they added distinct utility and educational value.

Application to AI Training: The case cuts against AI developers. Transformative use was grounded in educational value and a non-substitutive purpose. Commercial AI training does not satisfy these requirements.

3. The Chancellor, Masters and Scholars of the University of Oxford v. Rameshwari Photocopy Services (Delhi High Court, 2016)

The Court held that photocopying and distributing course packs for teaching could fall within Section 52(1)(i) and that the educational exception is purpose-based.

Application to AI Training: Commercial AI training serves the purpose of building a profitable product, not instruction or private study, and therefore falls outside Section 52.

4. Section 51 and Section 58 of the Copyright Act, 1957

Under Section 51, copyright infringement occurs when any person, without a licence, performs any act that is the exclusive right of the copyright owner.

Under Section 58, infringing copies and plates used for their production are deemed the property of the copyright owner, who may initiate proceedings for recovery of possession or conversion.

5. Section 52(1)(b) — Transient Storage

The exception applies only where storage is transient or incidental and occurs purely as a mechanical byproduct of electronic transmission.

Application to AI Training: AI training ingestion is deliberate, retained for an extended period, and used to modify model parameters. It therefore falls outside the exception.

Conclusion

The copyright dilemma in generative AI training under Indian law is fundamentally a legislative gap problem. The Copyright Act, 1957 was not designed for machine-scale ingestion of copyrighted works, and its closed-list Section 52 framework cannot be judicially stretched to accommodate commercial AI training.

The key conclusions are:

● Infringement is the default position under Sections 14 and 51.

● No existing Section 52 exception covers large-scale commercial AI training.

● Most curated training datasets will attract copyright protection under the modicum of creativity standard established in Eastern Book Company v. D.B. Modak.

● Transformative use has limited traction in India and operates only within the confines of Section 52.

● Legislative intervention is necessary through either a TDM exception, voluntary licensing regime, or hybrid framework.

● The ANI v. OpenAI litigation will play a significant role in shaping future judicial and policy approaches.

Until an appellate court rules definitively, the legal position remains unsettled and AI developers operate at their own risk.

FAQs

Q1. Does India have a “fair use” defense for AI training?

No. India does not have an open-ended fair use doctrine. Section 52 of the Copyright Act, 1957 is a closed, exhaustive list of permitted acts. Commercial AI training does not fit within any enumerated category, and courts cannot create new exceptions through judicial balancing.

Q2. Can AI developers rely on the “research” exception under Section 52(1)(a)?

Only if the training is genuinely non-commercial and for private study or research. Commercial AI training by technology corporations does not qualify. The DPIIT working paper treats the existing Section 52 framework as insufficient for large-scale AI training, implicitly confirming this limitation. 

Q3. Does the transient storage exception under Section 52(1)(b) cover AI training?

No. Section 52(1)(b) covers only temporary, automatic buffer copies created purely as a mechanical byproduct of electronic transmission. AI training ingestion is deliberate, durable, and purposive — it is retained for the entire duration of training to modify model parameters. This exception is inapplicable.

Q4. Are training datasets themselves protected by copyright?

Yes, if they involve a minimal degree of creativity in selection, arrangement, or annotation. Under Eastern Book Company v. D.B. Modak, the modicum of creativity standard is satisfied by minimal editorial judgment, and pure labour is insufficient but a small creative input is enough. Curated news archives, annotated legal databases, and editorially selected image collections will typically qualify.

Q5. What is the DPIIT’s current position on AI training and copyright?

The DPIIT working paper favors a licensed, compensation-based model for training on copyrighted works, rather than a broad TDM exception. The consultation has produced competing submissions: rights-holder groups support voluntary licensing, while pro-technology groups argue for a TDM exception. No final policy has been adopted as of June 2026.

Q6. What is the significance of the ANI v. OpenAI litigation?

It is India’s first generative AI copyright litigation and is shaping both judicial and policy thinking on whether AI training constitutes infringement under the Copyright Act, 1957. Until an appellate court rules definitively, the legal position remains unsettled. AI developers operating in India do so without the protection of a settled fair dealing defense.

Q7. What remedies are available to copyright owners against AI developers?

Under Section 58 of the Copyright Act, 1957, all infringing copies are deemed the property of the copyright owner, who may initiate proceedings for recovery of possession or conversion. Under Section 60, groundless threats of legal proceedings also expose the threatening party to declaratory suits and damages. Civil remedies include injunctions, damages, and account of profits. Criminal remedies under Section 63 are also available for willful infringement.

Q8. How does India’s position compare to the EU and USA?

The EU has enacted a specific TDM exception under Articles 3 and 4 of the Directive on Copyright in the Digital Single Market, permitting AI training on lawfully accessed works subject to opt-out rights for commercial mining. The USA applies an open-ended four-factor fair use test, and U.S. courts are actively litigating whether AI training constitutes fair use. India has neither a TDM exception nor an open-ended fair use doctrine, placing it in the most restrictive position of the three jurisdictions. The DPIIT consultation is India’s attempt to address this gap, but no legislative amendment has been enacted as of June 2026.

 

 

Exit mobile version