Copyright: A type of intellectual property (IP) that protects tangible forms of artistic, literary, or intellectual works, such as paintings, books, and software. Copyright lasts for decades, often up to 70 years after the author's death.Patents: IP protections for inventions and new processes, differing from copyright by covering functional aspects rather than creative expressions.Fair use: A legal doctrine allowing limited use of copyrighted material without permission under certain conditions, such as for criticism, comment, news reporting, teaching, or research.Generative AI: Artificial intelligence systems that create new text, images, videos, and other media, raising debates on copyrightability and ownership of the generated outputs.Inputs in AI training: The data used to train generative AI models, which can include copyrighted material. Issues arise about whether using such data without permission constitutes copyright infringement.Outputs in AI: The new works produced by generative AI, such as text or images, and the debate over their copyrightability, given that human authorship is typically required for copyright protection.Transformative use: A type of fair use where the new work adds something new with a different purpose or character, not substituting for the original work.Creative control: The level of influence a human has over the creation of a work, which impacts whether AI-generated outputs are deemed copyrightable.Copyright registration: The process of officially registering a work with the U.S. Copyright Office, which currently requires human authorship for protection.

Generative AI Copyright Concerns & 3 Best Practices in 2026

Cem Dilmegani

updated on Jan 21, 2026

See our ethical norms

We analyzed tens of court cases and licensing deals to answer these key questions about copyright and generative AI. However, this is not legal advice. Copyright law varies by jurisdiction and is actively evolving. Consult qualified legal counsel for your specific situation.

The Three Big Questions

Can copyright-protected data be used as training data? In the US, training on copyrighted works is likely fair use IF you obtain the copies legally. Downloading from pirate sites is not.
Are AI-generated works eligible for copyright protection? In most countries, substantial human involvement is required for eligibility.
Who is the owner of the generative AI copyright? Depends on who is designated to be the creator of the work. However, so far, no copyrights have been awarded to a machine or software.

1. Can copyright-protected data be used as training data?

In most jurisdictions, the legality of using copyrighted works as input to AI training algorithms is being decided. If training can be subsumed under fair use, it would be allowed. However, in this case, the line between fair use and copyright infringement is blurred in most jurisdictions.

Summary of specific status in some countries:

USA

Major generative AI companies like OpenAI and Google are investing in licensing copyrighted material.¹²³⁴⁵ A user-generated content platform, Reddit, is expecting to earn ~$70M/year from licensing agreements,⁶ and Shutterstock claimed to have earned $ 104 M.⁷

If the LLM providers believe they need licenses to this material, they expect the courts will not allow them to use copyrighted material in training, even if the material is publicly shared.

This question will be clarified when some of the ongoing court processes are concluded.⁸

The U.S. Copyright Office’s AI report, released in three parts, examines legal issues surrounding AI and copyright:

July 31, 2024: It recommended federal legislation to prevent unauthorized digital replicas that falsely depict individuals.⁹
Upcoming (2025): The final part will address AI training on copyrighted works, focusing on licensing and liability.¹⁰

France

The competition authority fined Google €250M for using news articles without permission in training its artificial intelligence system, Gemini.¹¹ This doesn’t settle the status of this question in France but shows how a government agency approached the subject.

Japan

Copyrighted work can be used for generative AI in most cases if:

The material used in training is not copyright-infringing
It does not unreasonably harm the copyright holder’s interests.¹²

Fair use vs copyright infringement

Intellectual Property law is a specialized set of legislation that safeguards and enforces the rights of creators and owners of creative works, such as inventions, writings, music, designs, and other forms of intellectual property.

Copyright infringement is a serious crime that can result in imprisonment. Ignorance of IP law when using copyrighted material will not excuse anyone’s liability or organize a legal defense against claims by copyright owners.

Fair use doctrine allows for limited use of copyrighted material without needing permission from the copyright holder if said usage falls under specific categories, such as

criticism/commentary
news reporting
teaching
research

Legal and Regulatory Challenges for AI Development

AI companies face significant legal challenges in two key areas: copyright infringement related to training data and antitrust concerns about market concentration.

Copyright Battles Over Training Data

Current Legal Status: Anthropic faces a potentially business-ending copyright lawsuit over allegations that it used pirated books to train its Claude AI without proper licensing. The legal status of using copyrighted works for AI training remains unclear in most jurisdictions.

Industry Licensing Response: Major AI companies are securing content licenses rather than relying on fair use defenses. Reddit expects to earn approximately $70 million annually from AI training licensing agreements, while Shutterstock reported $104 million in licensing revenue from AI companies.

International Approaches: France’s competition authority fined Google €250 million for using news articles without permission in training Gemini. Japan allows copyrighted work for AI training if it doesn’t unreasonably harm copyright holders’ interests. The U.S. Copyright Office will address AI training on copyrighted works in 2025.

Antitrust and Market Competition

Strategic Competition Plans: OpenAI’s confidential 2025 strategy document reveals plans to compete directly with “powerful incumbents who will leverage their distribution to advantage their products,” specifically naming Google, Apple, Microsoft, and Meta. The strategy includes developing search indexing capabilities and seeking default AI assistant status on major platforms.¹³

Regulatory Advocacy: The document outlines policy efforts advocating for user choice, stating, “Users should be able to pick their AI assistant. If you’re on iOS, Android, or Windows, you should be able to set ChatGPT as your default.” It also demands that “Google, Apple, Microsoft should offer users a choice for their default search engine and make their underlying indexes accessible to AI assistants.”

Market Access Challenges: OpenAI acknowledges that “the cards are stacked against us; we’re competing with powerful incumbents” and seeks regulatory intervention to access established distribution channels.¹⁴

Industry Impact

The resolution of copyright litigation will determine whether comprehensive licensing becomes required for AI training data. Antitrust investigations may examine exclusive partnerships and platform access restrictions. These outcomes will significantly impact development costs, market structure, and competitive dynamics between AI-first companies and established tech platforms.

Copyrighted data for training purposes: fair use or copyright infringement?

Training AI models on copyrighted data will likely be considered fair use. Yet, the same cannot necessarily apply to generating content. To put it more clearly, you can utilize someone else’s data to train AI models in alignment with your needs. However, what you do with the generated output of this model might infringe copyright law.

OpenAI expressed that using ML algorithms to train AI programs by examining copyrighted data should be considered fair use.¹⁵

Another essential factor to consider when assessing fair use is if academic researchers and nonprofit organizations have produced the training data and models or not. Startups are well aware of this as it tends to reinforce their fair use defenses.

As an example, Stability AI – the distributor of Stable Diffusion – neither collected the model’s training data nor trained the models. Rather, it funded and coordinated this work with academics. The Stable Diffusion model is licensed by a German university, enabling Stability AI to transform its creation into a commercial service while remaining legally separate from it.¹⁶

However, when AI-generated works are copyrighted and then used for training sets, a legal conundrum can arise if the original creator did not license their use in such a way. To ensure that laws around copyright and fair use are respected, producers of generative AI content should demonstrate due diligence in obtaining proper licenses when possible.

2. Are AI-generated works eligible for copyright protection?

Whether AI-generated works are eligible for copyright protection varies from each country. However, in general, substantial human involvement is required for its eligibility.

There are doubts about whether works generated by AI tools should be eligible for copyright protection. The possible options are:

AI-generated works do not apply to the copyright protection requirements because they are not the result of human creativity.
AI-generated works should be eligible for copyright protection because they are the product of complex algorithms and programming. Moreover, the creators of these algorithms and programs should be recognized as the authors of the works generated by the AI.

The Copyright Office released Part 2 of its AI report, stating that AI-generated outputs qualify for copyright only if humans provide sufficient creative input. Writing a prompt doesn’t qualify.¹⁷
What counts as “sufficient creative input”:

Protected: The comic book “Zarya of the Dawn” (2022)

The author structured the story
Designed page layouts
Made artistic decisions about element arrangement
Used Midjourney to generate images, but humans controlled the composition

Not protected: AI-generated art “Théâtre D’opéra Spatial” (2023)

Denied protection
Won the Colorado State Fair art competition
Creator spent weeks crafting prompts
Copyright Office: Prompting isn’t enough creative input

AI-assisted artwork received copyright protection

In September 2022, the US Copyright Office made history by issuing a groundbreaking registration for the comic book Zarya of the Dawn, created using the text-to-image AI tool Midjourney.¹⁸ The author clarified that the artwork was AI-assisted, not solely AI-generated. She structured the story, designed the page layouts, and made artistic decisions to arrange the elements alongside the AI-generated images.

Figure 1. Drawings from the last page of AI-generated comic book Zarya of the Dawn. (Source: Zarya of the Dawn)

The award-winning Midjourney image was denied copyright protection.

Another controversial example of generative art is an AI-generated print that won an art fair competition at the Colorado State Fair.¹⁹ The creator expressed that he spent numerous weeks curating the perfect prompts and manually identifying the finished product. The award-winning AI-generated art is shown in Figure 2 below.

Figure 2. The award-winning AI-generated print Theatre d’Opera Spatial. (Source: The Verge)

This image was denied copyright protection.²⁰. Ultimately, whether AI-generated works are eligible for copyright protection raises questions about ownership rights and who would own the copyright in such cases. Countries requiring a human agency for authorship generally deny copyright protection of AI-generated works.

3. Who is the owner of the generative AI copyright?

The authorship and ownership rights of AI-generated works are also disputable.

Under the copyright law of most countries, the creator of a work is generally considered the copyright owner. However, when AI creates a job, it is unclear who the creator is. Such ambiguity can create problems in determining who has the right to exploit the work, and in enforcing copyright violations.

The programmer approach: Some countries (the UK, India, Ireland, New Zealand, and Hong Kong) allow programmers to claim authorship of computer-generated works. The “person by whom arrangements necessary for creation are undertaken” owns the copyright.²¹

Problem: What about the training data creators? If an AI trained on Rembrandt paintings generates new artwork, does the programmer get full credit while Rembrandt’s contribution is ignored?

Figure 3. “The Next Rembrandt” is a computer-generated 3D painting that was inspired by the real paintings of 17th-century Dutch painter Rembrandt. (Source: The Guardian)

The user approach: If a person provides substantial creative direction (beyond simple prompts), they might qualify as an author. But courts are still defining “substantial.”

The AI-as-author approach: Stephen Thaler sued the US Copyright Office in 2022, arguing his “Creativity Machine” should be recognized as the author of its works. Courts rejected this. No jurisdiction recognizes AI as a legal “person” capable of holding copyright.

Practical reality: Most businesses avoid this question by having employees or contractors create AI-generated content under work-for-hire agreements. The company owns it regardless.

How AI Companies Are Actually Handling This

Licensing Deals Exploded in 2024-2025

Rather than fight every copyright battle, major AI companies are signing licensing deals:

OpenAI partnerships:

Financial Times (April 2024)
Vox Media (May 2024)
The Atlantic (May 2024)
Reddit ($70M/year)
Multiple news publishers (2024-2025)

Google deals:

Reddit licensing agreement (February 2024)
Multiple news organizations

Revenue for content providers:

Reddit: ~$70M annually from AI licensing
Shutterstock: $104M from AI licensing deals (2024)

These deals suggest AI companies don’t believe they can rely solely on fair use defenses.

Generative AI copyright best practices

For content creators:

Register your copyrights: Only registered works qualify for Anthropic settlement compensation (US)
Check the Works List: Anthropic published list of 500,000+ books (October 2, 2025) at the settlement website
Decide on opt-out: By January 7, 2026, decide whether to take settlement or pursue individual claims
Use opt-out mechanisms: Many AI companies now offer ways for creators to exclude their works from training

For businesses deploying AI:

Assess risk tolerance: Decide which use cases justify AI-generated content
Document human involvement: If claiming copyright in AI outputs, document your creative process
Review vendor commitments: Understand what legal protection your AI vendor provides
Implement AI governance: Track what AI tools are used, for what purposes, with what training data
Stay updated: This is evolving rapidly; 2025 rulings won’t be the final word

For AI companies:

License proactively: Licensing deals are cheaper than settlements
Document data provenance: Be able to prove where every piece of training data came from
Avoid pirate sites: Anthropic learned this the expensive way
Prepare for outputs litigation: Anthropic settlement only covered training data, not outputs
Plan for regional variations: What’s legal in the US might not be in the EU or other jurisdictions

Why are copyrights important in generative AI?

Generative AI creates legal and ethical issues that must be addressed. One of the most important of these is the question of copyright, which determines who owns the rights to creative works and how they may be used. Companies relying on generative AI tools without understanding local legislation on generative AI, copyright risk, reputational issues, or legal fines.

FAQ

For more on generative AI

If you questions about generative AI or need help in finding vendors, reach out:

Find the Right Vendors

Reference Links

Subscribe to read

Financial Times

What OpenAI's Latest News Partnerships Mean for the Industry's Future - Business Insider

Business Insider

A Content and Product Partnership with Vox Media | OpenAI

The Atlantic product, content partnership with OpenAI - The Atlantic

Exclusive: Reddit in AI content licensing deal with Google | Reuters

Reuters

SEC.gov | Your Request Originates from an Undeclared Automated Tool

Shutterstock’s AI-Licensing Business Generated $104 Million

Photo giant Getty took a leading AI image-maker to court. Now it's also embracing the technology | AP News

AP News

“Copyright and Artificial Intelligence Part 1: Digital Replicas” The US Copyright Office. July 2o24. Retrieved at April 3, 2025.

https://www.bunka.go.jp/english/policy/copyright/pdf/94055801_01.pdf

13.

A copyright lawsuit over pirated books could result in ‘business-ending’ damages for Anthropic | Fortune

Fortune

14.

Department of Justice | Homepage | United States Department of Justice

15.

“Before the United States Patent and Trademark Office Department of Commerce Comment Regarding Request for Comments on Intell.” USPTO. Accessed January 1, 2023.

16.

Revolutionizing image generation by AI: Turning text in … - LMU Munich

17.

“Copyright and Artificial Intelligence Part 2: Copyrightability.” The US Copyright Office. January 2o25. Retrieved at April 3, 2025.

18.

Artist receives first known US copyright registration for latent diffusion AI art - Ars Technica

Ars Technica

19.

Artwork generated using AI software Midjourney won a state competition | The Verge

The Verge

20.

US Copyright Office denies protection for another AI-created image | Reuters

Reuters

21.

Artificial Intelligence and Intellectual Property: copyright and patents - GOV.UK

GOV.UK

Principal Analyst

Cem Dilmegani

Principal Analyst

Follow On

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

Next to Read

GenAI ApplicationsJan 28

Sena Sezer

GenAI ApplicationsJul 21

Generative AI Copyright Concerns & 3 Best Practices in 2026

The Three Big Questions

1. Can copyright-protected data be used as training data?