Copyright-Piracy Confusion

The problem in one paragraph

When public debate frames generative AI as piracy, distinct legal questions get smashed into one. The result is that policymakers, courts, and commentators all react to the same general fear without distinguishing between things that copyright law has long known how to evaluate separately. The label "piracy" imports an old distribution-era analogy, copying-and-distributing-fixed-files, into a technology that mostly doesn't work that way. The analogy obscures more than it reveals.

The four layers piracy talk collapses

Copyright in generative AI actually operates across four separate questions. Each has its own doctrine. Each has its own evidence requirements. Treating them as one is the confusion.

Layer 1

Acquisition

How did the training data get into the model's pipeline? Unauthorized downloads or pirated sources can violate the reproduction right regardless of what the model is later used for. This question is about the source of files, not the capabilities of the trained system.

Layer 2

Reproduction

If a model produces verbatim or near-verbatim passages from a copyrighted work, that's standard copyright infringement analysis, substantial similarity, market harm, the usual factors. Whether a human or a machine produced the copy is irrelevant to the test.

Layer 3

Training

Is the act of training on copyrighted material itself an infringement? Recent federal decisions evaluate this within fair use doctrine. Training can qualify as transformative when sufficiently distinct from the original use, but transformativeness has limits, particularly when the secondary use occupies the same commercial market.

Layer 4

Output

When the model produces something, does that output unlawfully reproduce protected expression? This is the decisive infringement question and it should be evaluated tech-neutrally, the same way courts have always evaluated whether one work infringes another, regardless of what tool produced it.

Why the confusion produces bad policy

When all four layers blur into "AI is piracy," regulatory responses drift toward broad restrictions on the technology itself. That kind of policy:

Misses actual infringement. Targeted enforcement against specific harmful conduct gets replaced with general anxiety about systems
Chills lawful innovation. Tools and uses that don't infringe anything get restricted alongside ones that do
Loses doctrinal traction. Existing copyright law has tools, fair use, substantial similarity, market harm, that work fine when applied to the right layer. The piracy framing skips over them.
Creates architecture-based liability. Suspicion attaches to systems rather than to demonstrable violations, treating capability as guilt

The fix

Stop using "piracy" as the umbrella term for AI copyright disputes. Identify which layer is actually in play in any given case, then apply the doctrine built for that layer. Acquisition disputes go to acquisition law. Output disputes get evaluated as expressive works. Training disputes proceed through fair use analysis. Each gets the framework it was designed for.

The companion concept, the Wizard of AI Curtain Test, provides a heuristic for one specific part of this: how to evaluate output infringement without letting the existence of AI itself bias the analysis.

Why the name

"Copyright-Piracy" because that's the conflation, copyright claims being made through piracy framing. "Confusion" because it's an analytical error, not a legal doctrine. The label is the diagnosis: the conversation is confused, and the confusion is the problem.

Cite this concept

Carvalho, W. D. (2026). Copyright-Piracy Confusion: Preserving Authorship-Blind Copyright Analysis in Generative AI. Cinderpoint. https://cinderpoint.com/ai/copyright-piracy-confusion/

Read the paper

Read on SSRN ↗ All AI research