AIMultiple ResearchAIMultiple ResearchAIMultiple Research
GenAI
Updated on Apr 6, 2025

Generative AI Copyright Concerns & 3 Best Practices [2025]

We analyzed tens of court cases and licensing deals to answer these key questions about copyright and generative AI. However, this is not legal advice and the right answer depends on the jurisdiction:

For more detailed answers:

In most jurisdictions, legality of the use of copyrighted works as input in AI training algorithms is being decided. If training can be subsumed under fair use, it would be allowed. However, in this case, the line between fair use and copyright infringement is blurred in most jurisdictions.

Summary of specific status in some countries:

USA

Major generative AI companies like OpenAI and Google are investing in licensing copyrighted material.1 2 3 4 5 A user-generated content platform, Reddit, is expecting to earn ~$70M/year from licensing agreements,6 and Shutterstock claimed to have earned $104M.7

If the LLM providers believe that they need the licenses to these material, then they expect that the courts will not allow them to use of copyrighted material in training even if the material is publicly shared.

Thomson Reuters has won a lawsuit against Ross Intelligence, a defunct competitor, that used Thomson Reuters’ data in training its model.8

This question will be clarified when some of the ongoing court processes are concluded.9

The U.S. Copyright Office’s AI report, released in three parts, examines legal issues surrounding AI and copyright:

  • July 31, 2024: It recommended federal legislation to prevent unauthorized digital replicas that falsely depict individuals.10
  • Upcoming (2025): The final part will address AI training on copyrighted works, focusing on licensing and liability.11

France

Competition authority fined Google €250M for using news articles including their use without permission in training Gemini.12 This doesn’t settle the status of this question in France but shows how a government agency approached the subject.

Japan

Copyrighted work can be used for generative AI in most cases if:

  • The material used in training is not copyright-infringing
  • It does not unreasonably hurt the interests of the copyright holder.13

Intellectual Property law is a special set of legislation safeguarding and enforcing the rights of creators and owners of creative works such as inventions, writings, music, designs and other intellectual property. 

Copyright infringement is a serious crime that can result in imprisonment. The ignorance of IP law while using copyrighted material will not excuse anyone’s liability or organize any kind of legal defense against claims made by copyright owners.

Fair use doctrine allows for limited use of copyrighted material without needing permission from the copyright holder if said usage falls under certain categories, such as

  • criticism/commentary
  • news reporting
  • teaching
  • research

For instance, using copyrighted material for educational purposes could qualify for fair use, whereas using copyrighted material for commercial purposes without permission from the copyright holder would be considered copyright infringement.

Training AI models on copyrighted data will likely be considered fair use. Yet, the same cannot necessarily apply to generating content. To put it more clearly: you can utilize someone else’s data in order to train AI models in alignment with your needs. However, what you do with the generated output of this model might infringe copyright law.

OpenAI expressed that using ML algorithms for training AI programs by examining copyrighted data should be considered fair use.14

Another important factor to consider when assessing fair use is if academic researchers and nonprofit organizations have produced the training data and models or not. Startups are well aware of this as it tends to reinforce their fair use defenses.

As an example, Stability AI – the distributor of Stable Diffusion – neither collected the model’s training data nor trained the models. Rather, it funded and coordinated this work with academics. The Stable Diffusion model is licensed by a German university, enabling Stability AI to transform its creation into a commercial service while remaining legally separate from it.15

However, when AI-generated works are copyrighted and then used for training sets, a legal conundrum can arise if the original creator did not license its use in such a way. To ensure that laws around copyright and fair use are respected, producers of generative AI content should demonstrate due diligence in obtaining proper licenses when possible.

Can AI-created data be used as training data?

Once courts clarify all of the open questions above, we will also have the ingredients to determine how AI-created data can be used for training and how the trained models can be used. This is a critical question as AI-created data is already fueling a generative AI boom. 16

Whether AI-generated works are eligible for copyright protection varies from each country. However, in general, substantial human involvement is required for its eligibility. 

There are doubts about whether works generated by AI tools should be eligible for copyright protection at all. The possible options are:

  • AI-generated works do not apply to the copyright protection requirements because they are not the result of human creativity.
  • AI-generated works should be eligible for copyright protection because they are the product of complex algorithms and programming. Moreover, the creators of these algorithms and programs should be recognized as the authors of the works generated by the AI.

For example, The U.S. Copyright Office’s AI report shared its second part in January 29, 2025. The report clarified that AI-generated outputs qualify for copyright only if a human provides sufficient creative input—mere prompts don’t count.17

This means that copyright laws do not currently protect works created solely by a machine. But if an individual can demonstrate substantial human involvement in its creation, then it is plausible they may receive copyright protection.

In September 2022, the US Copyright Office made history by issuing a groundbreaking registration for the comic book Zarya of the Dawn, created using the text-to-image AI tool Midjourney.18 The author clarified that the artwork was AI-assisted, not solely AI-generated. She structured the story, designed the page layouts, and made artistic decisions to arrange the elements alongside the AI-generated images.

Figure 1. Drawings from the last page of AI-generated comic book Zarya of the Dawn. (Source: Zarya of the Dawn)

Another controversial generative art example is an AI-generated print that won an art fair competition at the Colorado State Fair.19 The creator expressed that he spent numerous weeks curating the perfect prompts and manually identifying the finished product. The award-winning AI generated art is shown in Figure 2 below.

Figure 2. The award-winning AI-generated print Theatre d’Opera Spatial. (Source: The Verge)

This image was denied copyright protection.20 Ultimately, whether AI-generated works are eligible for copyright protection brings the question of ownership rights and who would own the copyright in these cases. Countries requiring a human agency for authorship generally deny copyright protection of AI-generated works.

The authorship and ownership rights of AI-generated works are also disputable.

Under the copyright law of most countries, the creator of a work is generally considered the copyright owner. However, when a work is created by AI, it is unclear who the creator is. Such ambiguity can create problems in determining who has the right to exploit the work, and in enforcing copyright violations.

There can be different solutions to this problem:

  • AI itself as the creator of the work, in which case the AI owner would have the copyright.
  • AI model’s human programmer as the creator, in which case the programmer would be the owner of the copyright.
  • Humans that prepared the AI model’s training data as the creators.

Various countries, including Hong Kong, India, Ireland, New Zealand and the UK, explicitly grant authorship rights to programmers. For example, the United Kingdom provides copyright for works created entirely by computers. Yet, it deems that the author should be “the person by whom the arrangements necessary for the creation of the work are undertaken.”21

Consequently, there are several interpretations of whom this “person” refers to. The generative model’s developer or operator? Or the model itself?

Stephen Thaler, creator of the Creativity Machine, is challenging the US Copyright Office’s stance on AI authorship. In June 2022, he sued after the office refused to register a digital image created by his system. Thaler asked for recognition of the Creativity Machine as the creator, not himself:

“My interest is the definition of what a person is,” he told Bloomberg Law, adding, “What I’m building is sentient machine intelligence. Maybe expansion to the term sentient organism would be in order.”22

Take the case where the authorship and copyright are given to the model programmer. Besides the programmed algorithm, generative AI models rely on an immense number of data for creating new content. For example, look at the Next Rembrandt painting in the figure below.

Figure 3. “The Next Rembrandt” is a computer generated 3D painted painting which fed on the real paintings of 17th century Dutch painter Rembrandt. (Source: The Guardian)

Given this highly artistic output, it is hard to give authorship solely to the programmer while bypassing the immense input from the real artist Rembrandt.

Currently, the ownership of copyright is as debatable as the eligibility of the generated works. Both vary from country to country and are open to reform according to the improvements in generative AI use.

We recommend businesses 2 primary steps:

Identify your businesses’ risk appetite for generative AI

This would lead to identifying which use cases make sense for your business. For example, you may not want generative AI code in your businesses’ most valuable proprietary code.

Leverage vendor commitments to minimize your business’ risk

To minimize enterprises’ concerns, vendors like Adobe and Microsoft are committing to defend their clients in case use of their solutions lead to legal issues. 23

Embrace ethical AI

While building your business’ generative AI stack, pay attention to the data used in training and fine-tuning generative AI models from a copyright perspective.

Implement 4 key principles of AI and manage AI inventory. Also, utilize these tools:

Learn more about our recommendations regarding enterprise generative AI.

Why are copyrights important in generative AI?

Generative AI creates legal and ethical issues that must be addressed. One of the most important of these is the question of copyright, which determines who owns the rights to creative works and how to use them. Companies relying on generative AI tools without knowing the local legislation about generative AI copyright are risking reputation issues or legal fines.

Glossary

Copyright: A type of intellectual property (IP) that protects tangible forms of artistic, literary, or intellectual works, such as paintings, books, and software. Copyright lasts for decades, often up to 70 years after the author’s death.
Patents: IP protections for inventions and new processes, differing from copyright by covering functional aspects rather than creative expressions.
Fair use: A legal doctrine allowing limited use of copyrighted material without permission under certain conditions, such as for criticism, comment, news reporting, teaching, or research.
Generative AI: Artificial intelligence systems that create new text, images, videos, and other media, raising debates on copyrightability and ownership of the generated outputs.
Inputs in AI training: The data used to train generative AI models, which can include copyrighted material. Issues arise about whether using such data without permission constitutes copyright infringement.
Outputs in AI: The new works produced by generative AI, such as text or images, and the debate over their copyrightability, given that human authorship is typically required for copyright protection.
Transformative use: A type of fair use where the new work adds something new with a different purpose or character, not substituting for the original work.
Creative control: The level of influence a human has over the creation of a work, which impacts whether AI-generated outputs are deemed copyrightable.
Copyright registration: The process of officially registering a work with the U.S. Copyright Office, which currently requires human authorship for protection.

For more on generative AI

If you questions about generative AI or need help in finding vendors, reach out:

Find the Right Vendors
Share This Article
MailLinkedinX
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments