We analyzed tens of court cases and licensing deals to answer these key questions about copyright and generative AI. However, this is not legal advice and the right answer depends on the jurisdiction:
- Can copyright-protected data be used as training data? To be clarified this year. In the US, training data needs to be licensed if it is used to compete against the copyright owner.
- Are AI-generated works eligible for copyright protection? In most countries, substantial human involvement is required for its eligibility.
- Who is the owner of the generative AI copyright? Depends on who is designated to be the creator of the work. However, so far, no copyrights have been awarded to a machine or software.
For more detailed answers:
1- Can copyright-protected data be used as training data?
In most jurisdictions, legality of the use of copyrighted works as input in AI training algorithms is being decided. If training can be subsumed under fair use, it would be allowed. However, in this case, the line between fair use and copyright infringement is blurred in most jurisdictions.
Summary of specific status in some countries:
USA
Major generative AI companies like OpenAI and Google are investing in licensing copyrighted material.1 2 3 4 5 A user-generated content platform, Reddit, is expecting to earn ~$70M/year from licensing agreements,6 and Shutterstock claimed to have earned $104M.7
If the LLM providers believe that they need the licenses to these material, then they expect that the courts will not allow them to use of copyrighted material in training even if the material is publicly shared.
Thomson Reuters has won a lawsuit against Ross Intelligence, a defunct competitor, that used Thomson Reuters’ data in training its model.8
This question will be clarified when some of the ongoing court processes are concluded.9
The U.S. Copyright Office’s AI report, released in three parts, examines legal issues surrounding AI and copyright:
- July 31, 2024: It recommended federal legislation to prevent unauthorized digital replicas that falsely depict individuals.10
- Upcoming (2025): The final part will address AI training on copyrighted works, focusing on licensing and liability.11
France
Competition authority fined Google €250M for using news articles including their use without permission in training Gemini.12 This doesn’t settle the status of this question in France but shows how a government agency approached the subject.
Japan
Copyrighted work can be used for generative AI in most cases if:
- The material used in training is not copyright-infringing
- It does not unreasonably hurt the interests of the copyright holder.13
Fair use vs copyright infringement
Intellectual Property law is a special set of legislation safeguarding and enforcing the rights of creators and owners of creative works such as inventions, writings, music, designs and other intellectual property.
Copyright infringement is a serious crime that can result in imprisonment. The ignorance of IP law while using copyrighted material will not excuse anyone’s liability or organize any kind of legal defense against claims made by copyright owners.
Fair use doctrine allows for limited use of copyrighted material without needing permission from the copyright holder if said usage falls under certain categories, such as
- criticism/commentary
- news reporting
- teaching
- research
For instance, using copyrighted material for educational purposes could qualify for fair use, whereas using copyrighted material for commercial purposes without permission from the copyright holder would be considered copyright infringement.
Copyrighted data for training purposes: fair use or copyright infringement?
Training AI models on copyrighted data will likely be considered fair use. Yet, the same cannot necessarily apply to generating content. To put it more clearly: you can utilize someone else’s data in order to train AI models in alignment with your needs. However, what you do with the generated output of this model might infringe copyright law.
OpenAI expressed that using ML algorithms for training AI programs by examining copyrighted data should be considered fair use.14
Another important factor to consider when assessing fair use is if academic researchers and nonprofit organizations have produced the training data and models or not. Startups are well aware of this as it tends to reinforce their fair use defenses.
As an example, Stability AI – the distributor of Stable Diffusion – neither collected the model’s training data nor trained the models. Rather, it funded and coordinated this work with academics. The Stable Diffusion model is licensed by a German university, enabling Stability AI to transform its creation into a commercial service while remaining legally separate from it.15
However, when AI-generated works are copyrighted and then used for training sets, a legal conundrum can arise if the original creator did not license its use in such a way. To ensure that laws around copyright and fair use are respected, producers of generative AI content should demonstrate due diligence in obtaining proper licenses when possible.
Can AI-created data be used as training data?
Once courts clarify all of the open questions above, we will also have the ingredients to determine how AI-created data can be used for training and how the trained models can be used. This is a critical question as AI-created data is already fueling a generative AI boom. 16
2- Are AI-generated works eligible for copyright protection?
Whether AI-generated works are eligible for copyright protection varies from each country. However, in general, substantial human involvement is required for its eligibility.
There are doubts about whether works generated by AI tools should be eligible for copyright protection at all. The possible options are:
- AI-generated works do not apply to the copyright protection requirements because they are not the result of human creativity.
- AI-generated works should be eligible for copyright protection because they are the product of complex algorithms and programming. Moreover, the creators of these algorithms and programs should be recognized as the authors of the works generated by the AI.
For example, The U.S. Copyright Office’s AI report shared its second part in January 29, 2025. The report clarified that AI-generated outputs qualify for copyright only if a human provides sufficient creative input—mere prompts don’t count.17
This means that copyright laws do not currently protect works created solely by a machine. But if an individual can demonstrate substantial human involvement in its creation, then it is plausible they may receive copyright protection.
AI assisted artwork received copyright protection
In September 2022, the US Copyright Office made history by issuing a groundbreaking registration for the comic book Zarya of the Dawn, created using the text-to-image AI tool Midjourney.18 The author clarified that the artwork was AI-assisted, not solely AI-generated. She structured the story, designed the page layouts, and made artistic decisions to arrange the elements alongside the AI-generated images.

Figure 1. Drawings from the last page of AI-generated comic book Zarya of the Dawn. (Source: Zarya of the Dawn)
Award winning Midjourney image was denied copyright protection
Another controversial generative art example is an AI-generated print that won an art fair competition at the Colorado State Fair.19 The creator expressed that he spent numerous weeks curating the perfect prompts and manually identifying the finished product. The award-winning AI generated art is shown in Figure 2 below.

Figure 2. The award-winning AI-generated print Theatre d’Opera Spatial. (Source: The Verge)
This image was denied copyright protection.20 Ultimately, whether AI-generated works are eligible for copyright protection brings the question of ownership rights and who would own the copyright in these cases. Countries requiring a human agency for authorship generally deny copyright protection of AI-generated works.
3- Who is the owner of the generative AI copyright?
The authorship and ownership rights of AI-generated works are also disputable.
Under the copyright law of most countries, the creator of a work is generally considered the copyright owner. However, when a work is created by AI, it is unclear who the creator is. Such ambiguity can create problems in determining who has the right to exploit the work, and in enforcing copyright violations.
There can be different solutions to this problem:
- AI itself as the creator of the work, in which case the AI owner would have the copyright.
- AI model’s human programmer as the creator, in which case the programmer would be the owner of the copyright.
- Humans that prepared the AI model’s training data as the creators.
AI or its developer as the copyright holder
Various countries, including Hong Kong, India, Ireland, New Zealand and the UK, explicitly grant authorship rights to programmers. For example, the United Kingdom provides copyright for works created entirely by computers. Yet, it deems that the author should be “the person by whom the arrangements necessary for the creation of the work are undertaken.”21
Consequently, there are several interpretations of whom this “person” refers to. The generative model’s developer or operator? Or the model itself?
Stephen Thaler, creator of the Creativity Machine, is challenging the US Copyright Office’s stance on AI authorship. In June 2022, he sued after the office refused to register a digital image created by his system. Thaler asked for recognition of the Creativity Machine as the creator, not himself:
“My interest is the definition of what a person is,” he told Bloomberg Law, adding, “What I’m building is sentient machine intelligence. Maybe expansion to the term sentient organism would be in order.”22
Humans creating the training data as copyright holders
Take the case where the authorship and copyright are given to the model programmer. Besides the programmed algorithm, generative AI models rely on an immense number of data for creating new content. For example, look at the Next Rembrandt painting in the figure below.

Figure 3. “The Next Rembrandt” is a computer generated 3D painted painting which fed on the real paintings of 17th century Dutch painter Rembrandt. (Source: The Guardian)
Given this highly artistic output, it is hard to give authorship solely to the programmer while bypassing the immense input from the real artist Rembrandt.
Currently, the ownership of copyright is as debatable as the eligibility of the generated works. Both vary from country to country and are open to reform according to the improvements in generative AI use.
Generative AI copyright best practices
We recommend businesses 2 primary steps:
Identify your businesses’ risk appetite for generative AI
This would lead to identifying which use cases make sense for your business. For example, you may not want generative AI code in your businesses’ most valuable proprietary code.
Leverage vendor commitments to minimize your business’ risk
To minimize enterprises’ concerns, vendors like Adobe and Microsoft are committing to defend their clients in case use of their solutions lead to legal issues. 23
Embrace ethical AI
While building your business’ generative AI stack, pay attention to the data used in training and fine-tuning generative AI models from a copyright perspective.
Implement 4 key principles of AI and manage AI inventory. Also, utilize these tools:
Learn more about our recommendations regarding enterprise generative AI.
Why are copyrights important in generative AI?
Generative AI creates legal and ethical issues that must be addressed. One of the most important of these is the question of copyright, which determines who owns the rights to creative works and how to use them. Companies relying on generative AI tools without knowing the local legislation about generative AI copyright are risking reputation issues or legal fines.
Glossary
Copyright: A type of intellectual property (IP) that protects tangible forms of artistic, literary, or intellectual works, such as paintings, books, and software. Copyright lasts for decades, often up to 70 years after the author’s death.
Patents: IP protections for inventions and new processes, differing from copyright by covering functional aspects rather than creative expressions.
Fair use: A legal doctrine allowing limited use of copyrighted material without permission under certain conditions, such as for criticism, comment, news reporting, teaching, or research.
Generative AI: Artificial intelligence systems that create new text, images, videos, and other media, raising debates on copyrightability and ownership of the generated outputs.
Inputs in AI training: The data used to train generative AI models, which can include copyrighted material. Issues arise about whether using such data without permission constitutes copyright infringement.
Outputs in AI: The new works produced by generative AI, such as text or images, and the debate over their copyrightability, given that human authorship is typically required for copyright protection.
Transformative use: A type of fair use where the new work adds something new with a different purpose or character, not substituting for the original work.
Creative control: The level of influence a human has over the creation of a work, which impacts whether AI-generated outputs are deemed copyrightable.
Copyright registration: The process of officially registering a work with the U.S. Copyright Office, which currently requires human authorship for protection.
For more on generative AI
- Generative AI: Top 11 Applications
- Generative AI in Healthcare: Benefits, Challenges, Potentials
- Generative AI in Fashion: 5 Use Cases with Case Studies
- Top 5 Use Cases of Generative AI in Education
- Top 4 Use Cases of Generative AI in Banking
If you questions about generative AI or need help in finding vendors, reach out:
External Links
- 1. Subscribe to read. Financial Times
- 2. What OpenAI's Latest News Partnerships Mean for the Industry's Future - Business Insider. Business Insider
- 3. A Content and Product Partnership with Vox Media | OpenAI.
- 4. The Atlantic product, content partnership with OpenAI - The Atlantic.
- 5. Exclusive: Reddit in AI content licensing deal with Google | Reuters. Reuters
- 6. SEC.gov | Request Rate Threshold Exceeded.
- 7. Shutterstock’s AI-Licensing Business Generated $104 Million.
- 8. https://fingfx.thomsonreuters.com/gfx/legaldocs/xmvjbznbkvr/THOMSON%20REUTERS%20ROSS%20LAWSUIT%20fair%20use.pdf
- 9. Photo giant Getty took a leading AI image-maker to court. Now it's also embracing the technology | AP News. AP News
- 10. “Copyright and Artificial Intelligence Part 1: Digital Replicas” The US Copyright Office. July 2o24. Retrieved at April 3, 2025.
- 11. Copyright Office Releases Part 2 of Artificial Intelligence Report. Library of Congress
- 12. Related rights: the Autorité fines Google €250 million for non-compliance with some of its commitments made in June 2022 | Autorité de la concurrence.
- 13. https://www.bunka.go.jp/english/policy/copyright/pdf/94055801_01.pdf
- 14. “Before the United States Patent and Trademark Office Department of Commerce Comment Regarding Request for Comments on Intell.” USPTO. Accessed January 1, 2023.
- 15. Revolutionizing image generation by AI: Turning text in … - LMU Munich.
- 16. GitHub - lamini-ai/lamini: The Official Python Client for Lamini's API.
- 17. “Copyright and Artificial Intelligence Part 2: Copyrightability.” The US Copyright Office. January 2o25. Retrieved at April 3, 2025.
- 18. Artist receives first known US copyright registration for latent diffusion AI art - Ars Technica. Ars Technica
- 19. Artwork generated using AI software Midjourney won a state competition | The Verge. The Verge
- 20. US Copyright Office denies protection for another AI-created image | Reuters. Reuters
- 21. Artificial Intelligence and Intellectual Property: copyright and patents - GOV.UK . GOV.UK
- 22. Artificial Intelligence Can Be Copyright Author, Suit Says (1).
- 23. Microsoft announces new Copilot Copyright Commitment for customers - Microsoft On the Issues. Microsoft
Comments
Your email address will not be published. All fields are required.