If you’re scraping the web, you’ve likely seen how it has benefited your business. However, as of 2026, the legal landscape has shifted dramatically.
While historical cases focused on “unauthorized access,” new landmark lawsuits involving AI training and technical circumvention are redrawing the lines of what is permissible.
See below for the most recent web scraping lawsuits (including the Reddit v. Perplexity and NYT v. OpenAI cases), and the current legal landscape by country to help you stay on the right side of the law.
Disclaimer: Our work is for informational purposes only and not legal advice; please get professional legal advice for specific guidance.
Is web scraping legal?
Web scraping is legal if you scrape publicly available data on the web. However, the legality of web scraping depends on how, what, and why you’re scraping.
Web scraping can be legal when you:
- Prioritize logged-out scraping: Scrape publicly available data from webpages accessible without a login, subscription, or payment.
- Avoid technical circumvention: Respect the website’s terms of service, robots.txt file, and copyright laws.
- Align with commercial use policies: Ensure your scraping intent (e.g., search indexing vs. AI model training) aligns with the site’s commercial use policies. Cases like Reddit v. Anthropic are currently defining new boundaries for “Fair Use” when data is explicitly scraped for AI development.
- Comply with global privacy laws: Don’t collect personal or sensitive data, such as names or contact information, in a manner that violates privacy laws, including the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
For more on ethical data collection, check out our ethical & compliant web data benchmark.
Latest web scraping legal updates
Though web scraping can be legal, being scraped is not desired by companies. If these platforms can show that being scraped by a bot damages their infrastructure or operations, then that activity may be found illegal by the court.
Here, we have compiled the most significant lawsuits in which the court sided with the scraped website; these cases, especially from the U.S.
Reddit vs. Perplexity AI & scraping services
Court: U.S. District Court for the Southern District of New York
Timeline: October 2025 – Present (Active Case)
Reddit sued the AI search engine Perplexity AI and three major scraping/proxy providers (SerpApi, Oxylabs, AWMProxy) for industrial-scale data collection and bypassing technical barriers. 1
Conflict:
Reddit alleges that the defendants engaged in a “bank robbery-style” scheme to steal copyrighted content. Instead of entering into licensing agreements (like OpenAI and Google), Perplexity used specialized scraping tools to bypass Reddit’s defenses.
Legal arguments:
- Indirect scraping via Google: Defendants bypassed Reddit’s own blocks by scraping Reddit’s content directly from Google Search Results (SERPs).
- DMCA violations: Unlike previous “public data” cases (such as hiQ), Reddit is invoking the Digital Millennium Copyright Act (DMCA) Section 1201. They argue that the defendants didn’t just “access” data, but purposefully bypassed “technological measures” (rate limits, captchas, and SearchGuard).
- Refusal to license: Reddit highlights that while other AI giants pay for data access, Perplexity increased its scraping volume 40-fold after receiving a cease-and-desist letter, choosing “circumvention over cooperation.”
Current status:
As of late 2025, the case is ongoing, and no final ruling has been issued.
Reddit vs. Anthropic
Court: Superior Court of California in San Francisco
Timeline: Late 2025 – Present (Active Litigation)
Reddit sued the AI startup Anthropic, accusing it of unlawfully using data from its 100 million daily users to train its AI systems.2
Unlike Google and OpenAI, who have paid licensing deals with Reddit, Anthropic allegedly declined to enter into an agreement. Reddit’s legal team argues that without a formal agreement, there are no guardrails to ensure user privacy protections.
Current status:
As of late 2025, there has been no final court ruling. The case is currently in the pre-trial discovery phase. Anthropic has moved to have parts of the case dismissed, arguing that factual data is not copyrightable.
Linkedin vs hiQ Labs Case
Court: U.S. District Court / Ninth Circuit Court of Appeals
Timeline: 2017–2022
LinkedIn sued hiQ Labs, a data analytics company, for scraping publicly available profiles to conduct a professional skill analysis.3 Several courts, including the Supreme Court, reviewed the case:
- The court initially sided with hiQ, ruling that scraping public data does not violate the Computer Fraud and Abuse Act (CFAA).4
- In 2022, the Ninth Circuit reaffirmed this, stating that accessing publicly available data without authorization is not “unauthorized access” under CFAA.
The court ruled that LinkedIn’s actions to block hiQ were lawful. Despite CFAA considerations, breaching a website’s terms of service can result in legal consequences. hiQ’s violations of LinkedIn’s user agreement played a significant role in the final judgment.
Meta vs Bright Data
Court: U.S. District Court for the Northern District of California
Timeline: 2023–2024
Case Type: Civil lawsuit involving breach of contract and unauthorized data scraping
In January 2023, Meta initiated a lawsuit against Bright Data, alleging that it had illegally extracted data from Meta’s Facebook and Instagram platforms. Interestingly, Bright Data contested Meta’s claims about its data scraping rights, leading both parties to court.
The court ruled in favor of Bright Data, finding insufficient evidence to show that Bright Data had scraped nonpublic data or accessed data while logged into user accounts. In February 2024, Meta decided to drop the remaining claims against Bright Data.5
Does Meta (Facebook/Instagram) prohibit all automated data collection?
If you’ve read the Instagram terms of use, you’ve likely seen the clause stating that ‘scraping by automated means is prohibited.’
However, the legal reality is more complex. In the landmark Meta v. Bright Data (2024) case, the court ruled that if you are scraping public data while logged out, Meta’s terms do not necessarily apply because you never signed a contract by logging in.
Many websites include a Facebook terms, automated data collection, scraping prohibited’ warning. But as seen in recent web scraping legal updates, courts are increasingly distinguishing between data behind a login wall and data available to the open web.
X Corp., formerly Twitter vs Bright Data
Court: U.S. District Court for the Northern District of California
Timeline: 2023–ongoing
Case Type: Unauthorized data access under computer fraud statutes, intellectual property violations
In July 2023, X Corp. filed a lawsuit against Bright Data, alleging that Bright Data violated its terms of service by scraping and selling vast amounts of data from the X platform. 6 The legal action in California was about Bright Data’s access to public data on Twitter.
The case was dismissed, and the judge ruled that X failed to plausibly allege that Bright Data had violated its user agreement. The court held that terms of service could not prevent data scraping since X Corp was not the owner of the content and therefore could not enforce its copyright.
Owning user content would invalidate X Corp’s safe harbor protection, which enables social media companies to distance themselves from copyright infringement and other crimes committed by their users. Therefore, courts again ruled in favor of a party that collected public data from a social network.
eBay vs Bidder’s Edge Case
Court: United States District Court for the Northern District of California
Timeline: 1999–2000
Case type: Civil lawsuit for trespass to chattels, in which eBay accused Bidder’s Edge of unlawfully scraping its site using automated data collection bots.
Bidder’s Edge (BE), an online price comparison website, used web scraping tools to aggregate auction listings from various platforms, including eBay, without permission. 7 eBay claimed that BE’s automated bots caused unauthorized use of its systems.
The court order was preventing Bidger’s Edge from scraping eBay content again. The main argument eBay won was that Bidger’s Edge was overloading their system, and that others following Bidger’s Edge could cause further harm to eBay’s system.
Facebook vs Power Ventures Case
Court: U.S. District Court for the Northern District of California
Later, it appealed to the U.S. Court of Appeals for the Ninth Circuit
Timeline: 2008–2017
Case Type: Civil lawsuit under the CFAA and California’s anti-hacking law, with Facebook alleging unauthorized access to its platform.
In 2009, Facebook sued Power Ventures for scraping content from its users’ uploaded websites. This example set is for a case in which web scraping was evaluated from an intellectual property standpoint. The court sided with Facebook and ordered a fiscal penalty for Power Ventures.8
Latest regulations on web scraping by country
United States
Legal Status: The web scraping of publicly available data is generally considered legal.
There are no federal laws against web scraping in the United States as long as the scraped data is publicly available and the scraping activity does not harm the website being scraped. There is one specific act from 2016 against purchasing an excessive number of tickets at once using bots to prevent black markets.9
European Union and the UK
Legal Status: In the EU and UK, web scraping of publicly available, non-personal, and non-copyrighted content is generally legal, but scraping personal data without a lawful basis is prohibited under GDPR.
The EU recently passed the Digital Services Act, which aims to bring all EU countries under the Digital Single Market, sharing the same regulations. According to Articles 3 and 4 of this regulation, “reproduction of publicly available content” is not illegal.10 11
This regulation approaches the topic from an intellectual property perspective, and, needless to say, would deem any web scraping involving personal data illegal under the GDPR. Apart from it, the situation is similar to the US in the EU markets and the UK.
Dos and don’ts of legal and ethical web scraping
From a legal standpoint, one question businesses should ask themselves is whether their scraping acts harm the scraped website. If the scraping activity:
- It is too intense, which can interrupt the services of the scraped website
- The scraped data is used to duplicate the activity or service of that website, even though no regulations exist.
The website would have grounds to file a lawsuit against the scraper.
From an ethical standpoint, given that web scraping already has many use cases and professional providers in the market, there is no shame in using it for business purposes. There are technical web scraping best practices that will ease the traffic load on the scraped website, such as:
- Using the website’s APIs rather than web scraping, when available.
- Integrating web scrapers with proxy servers.
- Using headless browsers.
As long as you find a trusted web scraper to work with or make sure your technical resources consider these, you can defend your web scraping as ethical for your business purposes.
Dos:
- Scrape only the data you need by defining the exact business case and customizing your web crawler technology accordingly. This will minimize your risk of exhausting the scraped website with unwanted traffic.
- Always read the terms of use of the scraped website. In addition to commercial terms of use, websites also have a robots.txt file that specifies permissions for the website’s content. Your web crawling solution or technical experts should help you comply with these permissions.
- Be transparent about your web scraping and be ready to explain your scraping process to assure others that your approach is legal and ethical.
Don’ts:
- Do not exhaust the scraped website too often and with too extensive pulls. This will also increase the likelihood that the scraped website will block your crawler.
- Do not collect personally identifiable information, or if robot.txt allows you to collect it, ensure that you mask the data to minimize exposure during processing.
- Do not expose the scraped data to the public. Make sure that it is stored securely, just like your own company data. You never know what purposes it may be used for if it is leaked.
Sponsored
When partnering with a service provider for web scraping, ensure you leverage their technical expertise and legal experience. For example, Bright Data dedicates a compliance officer to its customers to ensure they have no questions about the legal processes of web scraping along the way.
Organizations for Ethical Web Scraping
Leading web data infrastructure companies have formed associations to align their industry and stakeholders on the ethical use of web scraping. These associations are:
- Alliance for Responsible Data Collection, which includes Bright Data and Common Crawl among other stakeholders.
- Ethical Web Data Collection Initiative (EWDCI), which includes Oxylabs, NetNut, ProxyEmpire, Zyte, among others.
What if a website’s terms of service forbid scraping?
If a website’s terms of service (ToS) explicitly prohibit scraping, accessing, or collecting data from that site through automated means, doing so may constitute a violation of those terms.
For instance, in the United States, unauthorized access to a computer system can be a federal offense under the Computer Fraud and Abuse Act (CFAA). You can contact the site owner to request permission or use official APIs for data access.
Comments 1
Share Your Thoughts
Your email address will not be published. All fields are required.
Thank you for the great and well-written articles. Can you write an article explaining the limits and/ or usefulness of using a website’s APIs rather than web scraping, when available. Instagram & TikTok website APIs for example are limited to what type of data can be extracted. My understanding is that not everything can be scrapped using their websites API. Looking forward to your response. Thank you.