We analyzed tens of court cases and licensing deals to answer these key questions about copyright and generative AI. However, this is not legal advice. Copyright law varies by jurisdiction and is actively evolving. Consult qualified legal counsel for your specific situation.
The Three Big Questions
- Can copyright-protected data be used as training data? In the US, training on copyrighted works is likely fair use IF you obtain the copies legally. Downloading from pirate sites is not.
- Are AI-generated works eligible for copyright protection? In most countries, substantial human involvement is required for eligibility.
- Who is the owner of the generative AI copyright? Depends on who is designated to be the creator of the work. However, so far, no copyrights have been awarded to a machine or software.
1. Can copyright-protected data be used as training data?
In most jurisdictions, the legality of using copyrighted works as input to AI training algorithms is being decided. If training can be subsumed under fair use, it would be allowed. However, in this case, the line between fair use and copyright infringement is blurred in most jurisdictions.
Summary of specific status in some countries:
USA
Major generative AI companies like OpenAI and Google are investing in licensing copyrighted material.1 2 3 4 5 A user-generated content platform, Reddit, is expecting to earn ~$70M/year from licensing agreements,6 and Shutterstock claimed to have earned $ 104 M.7
If the LLM providers believe they need licenses to this material, they expect the courts will not allow them to use copyrighted material in training, even if the material is publicly shared.
This question will be clarified when some of the ongoing court processes are concluded.8
The U.S. Copyright Office’s AI report, released in three parts, examines legal issues surrounding AI and copyright:
- July 31, 2024: It recommended federal legislation to prevent unauthorized digital replicas that falsely depict individuals.9
- Upcoming (2025): The final part will address AI training on copyrighted works, focusing on licensing and liability.10
France
The competition authority fined Google €250M for using news articles without permission in training its artificial intelligence system, Gemini.11 This doesn’t settle the status of this question in France but shows how a government agency approached the subject.
Japan
Copyrighted work can be used for generative AI in most cases if:
- The material used in training is not copyright-infringing
- It does not unreasonably harm the copyright holder’s interests.12
Fair use vs copyright infringement
Intellectual Property law is a specialized set of legislation that safeguards and enforces the rights of creators and owners of creative works, such as inventions, writings, music, designs, and other forms of intellectual property.
Copyright infringement is a serious crime that can result in imprisonment. Ignorance of IP law when using copyrighted material will not excuse anyone’s liability or organize a legal defense against claims by copyright owners.
Fair use doctrine allows for limited use of copyrighted material without needing permission from the copyright holder if said usage falls under specific categories, such as
- criticism/commentary
- news reporting
- teaching
- research
Legal and Regulatory Challenges for AI Development
AI companies face significant legal challenges in two key areas: copyright infringement related to training data and antitrust concerns about market concentration.
Copyright Battles Over Training Data
Current Legal Status: Anthropic faces a potentially business-ending copyright lawsuit over allegations that it used pirated books to train its Claude AI without proper licensing. The legal status of using copyrighted works for AI training remains unclear in most jurisdictions.
Industry Licensing Response: Major AI companies are securing content licenses rather than relying on fair use defenses. Reddit expects to earn approximately $70 million annually from AI training licensing agreements, while Shutterstock reported $104 million in licensing revenue from AI companies.
International Approaches: France’s competition authority fined Google €250 million for using news articles without permission in training Gemini. Japan allows copyrighted work for AI training if it doesn’t unreasonably harm copyright holders’ interests. The U.S. Copyright Office will address AI training on copyrighted works in 2025.
Antitrust and Market Competition
Strategic Competition Plans: OpenAI’s confidential 2025 strategy document reveals plans to compete directly with “powerful incumbents who will leverage their distribution to advantage their products,” specifically naming Google, Apple, Microsoft, and Meta. The strategy includes developing search indexing capabilities and seeking default AI assistant status on major platforms.13
Regulatory Advocacy: The document outlines policy efforts advocating for user choice, stating, “Users should be able to pick their AI assistant. If you’re on iOS, Android, or Windows, you should be able to set ChatGPT as your default.” It also demands that “Google, Apple, Microsoft should offer users a choice for their default search engine and make their underlying indexes accessible to AI assistants.”
Market Access Challenges: OpenAI acknowledges that “the cards are stacked against us; we’re competing with powerful incumbents” and seeks regulatory intervention to access established distribution channels.14
Industry Impact
The resolution of copyright litigation will determine whether comprehensive licensing becomes required for AI training data. Antitrust investigations may examine exclusive partnerships and platform access restrictions. These outcomes will significantly impact development costs, market structure, and competitive dynamics between AI-first companies and established tech platforms.
Copyrighted data for training purposes: fair use or copyright infringement?
Training AI models on copyrighted data will likely be considered fair use. Yet, the same cannot necessarily apply to generating content. To put it more clearly, you can utilize someone else’s data to train AI models in alignment with your needs. However, what you do with the generated output of this model might infringe copyright law.
OpenAI expressed that using ML algorithms to train AI programs by examining copyrighted data should be considered fair use.15
Another essential factor to consider when assessing fair use is if academic researchers and nonprofit organizations have produced the training data and models or not. Startups are well aware of this as it tends to reinforce their fair use defenses.
As an example, Stability AI – the distributor of Stable Diffusion – neither collected the model’s training data nor trained the models. Rather, it funded and coordinated this work with academics. The Stable Diffusion model is licensed by a German university, enabling Stability AI to transform its creation into a commercial service while remaining legally separate from it.16
However, when AI-generated works are copyrighted and then used for training sets, a legal conundrum can arise if the original creator did not license their use in such a way. To ensure that laws around copyright and fair use are respected, producers of generative AI content should demonstrate due diligence in obtaining proper licenses when possible.
2. Are AI-generated works eligible for copyright protection?
Whether AI-generated works are eligible for copyright protection varies from each country. However, in general, substantial human involvement is required for its eligibility.
There are doubts about whether works generated by AI tools should be eligible for copyright protection. The possible options are:
- AI-generated works do not apply to the copyright protection requirements because they are not the result of human creativity.
- AI-generated works should be eligible for copyright protection because they are the product of complex algorithms and programming. Moreover, the creators of these algorithms and programs should be recognized as the authors of the works generated by the AI.
The Copyright Office released Part 2 of its AI report, stating that AI-generated outputs qualify for copyright only if humans provide sufficient creative input. Writing a prompt doesn’t qualify.17
What counts as “sufficient creative input”:
Protected: The comic book “Zarya of the Dawn” (2022)
- The author structured the story
- Designed page layouts
- Made artistic decisions about element arrangement
- Used Midjourney to generate images, but humans controlled the composition
Not protected: AI-generated art “Théâtre D’opéra Spatial” (2023)
- Denied protection
- Won the Colorado State Fair art competition
- Creator spent weeks crafting prompts
- Copyright Office: Prompting isn’t enough creative input
AI-assisted artwork received copyright protection
In September 2022, the US Copyright Office made history by issuing a groundbreaking registration for the comic book Zarya of the Dawn, created using the text-to-image AI tool Midjourney.18 The author clarified that the artwork was AI-assisted, not solely AI-generated. She structured the story, designed the page layouts, and made artistic decisions to arrange the elements alongside the AI-generated images.
Figure 1. Drawings from the last page of AI-generated comic book Zarya of the Dawn. (Source: Zarya of the Dawn)
The award-winning Midjourney image was denied copyright protection.
Another controversial example of generative art is an AI-generated print that won an art fair competition at the Colorado State Fair.19 The creator expressed that he spent numerous weeks curating the perfect prompts and manually identifying the finished product. The award-winning AI-generated art is shown in Figure 2 below.
Figure 2. The award-winning AI-generated print Theatre d’Opera Spatial. (Source: The Verge)
This image was denied copyright protection.20 . Ultimately, whether AI-generated works are eligible for copyright protection raises questions about ownership rights and who would own the copyright in such cases. Countries requiring a human agency for authorship generally deny copyright protection of AI-generated works.
3. Who is the owner of the generative AI copyright?
The authorship and ownership rights of AI-generated works are also disputable.
Under the copyright law of most countries, the creator of a work is generally considered the copyright owner. However, when AI creates a job, it is unclear who the creator is. Such ambiguity can create problems in determining who has the right to exploit the work, and in enforcing copyright violations.
The programmer approach: Some countries (the UK, India, Ireland, New Zealand, and Hong Kong) allow programmers to claim authorship of computer-generated works. The “person by whom arrangements necessary for creation are undertaken” owns the copyright.21
Problem: What about the training data creators? If an AI trained on Rembrandt paintings generates new artwork, does the programmer get full credit while Rembrandt’s contribution is ignored?
Figure 3. “The Next Rembrandt” is a computer-generated 3D painting that was inspired by the real paintings of 17th-century Dutch painter Rembrandt. (Source: The Guardian)
The user approach: If a person provides substantial creative direction (beyond simple prompts), they might qualify as an author. But courts are still defining “substantial.”
The AI-as-author approach: Stephen Thaler sued the US Copyright Office in 2022, arguing his “Creativity Machine” should be recognized as the author of its works. Courts rejected this. No jurisdiction recognizes AI as a legal “person” capable of holding copyright.
Practical reality: Most businesses avoid this question by having employees or contractors create AI-generated content under work-for-hire agreements. The company owns it regardless.
How AI Companies Are Actually Handling This
Licensing Deals Exploded in 2024-2025
Rather than fight every copyright battle, major AI companies are signing licensing deals:
OpenAI partnerships:
- Financial Times (April 2024)
- Vox Media (May 2024)
- The Atlantic (May 2024)
- Reddit ($70M/year)
- Multiple news publishers (2024-2025)
Google deals:
- Reddit licensing agreement (February 2024)
- Multiple news organizations
Revenue for content providers:
- Reddit: ~$70M annually from AI licensing
- Shutterstock: $104M from AI licensing deals (2024)
These deals suggest AI companies don’t believe they can rely solely on fair use defenses.
Generative AI copyright best practices
For content creators:
- Register your copyrights: Only registered works qualify for Anthropic settlement compensation (US)
- Check the Works List: Anthropic published list of 500,000+ books (October 2, 2025) at the settlement website
- Decide on opt-out: By January 7, 2026, decide whether to take settlement or pursue individual claims
- Use opt-out mechanisms: Many AI companies now offer ways for creators to exclude their works from training
For businesses deploying AI:
- Assess risk tolerance: Decide which use cases justify AI-generated content
- Document human involvement: If claiming copyright in AI outputs, document your creative process
- Review vendor commitments: Understand what legal protection your AI vendor provides
- Implement AI governance: Track what AI tools are used, for what purposes, with what training data
- Stay updated: This is evolving rapidly; 2025 rulings won’t be the final word
For AI companies:
- License proactively: Licensing deals are cheaper than settlements
- Document data provenance: Be able to prove where every piece of training data came from
- Avoid pirate sites: Anthropic learned this the expensive way
- Prepare for outputs litigation: Anthropic settlement only covered training data, not outputs
- Plan for regional variations: What’s legal in the US might not be in the EU or other jurisdictions
Why are copyrights important in generative AI?
Generative AI creates legal and ethical issues that must be addressed. One of the most important of these is the question of copyright, which determines who owns the rights to creative works and how they may be used. Companies relying on generative AI tools without understanding local legislation on generative AI, copyright risk, reputational issues, or legal fines.
FAQ
For more on generative AI
- Generative AI in Healthcare: Benefits, Challenges, Potentials
- Generative AI in Fashion: 5 Use Cases with Case Studies
- Top 5 Use Cases of Generative AI in Education
- Top 4 Use Cases of Generative AI in Banking
If you questions about generative AI or need help in finding vendors, reach out:
Find the Right VendorsReference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.