If you’re scraping the web, you’ve probably already seen how it has benefited your business. However, if your site is being scraped, it may raise concerns about legality, ethics, and potential harm.
In this article, we explain the latest web scraping lawsuits, the current legal landscape by country, and key best web scraping practices to help you stay on the right side of the law and ethics. (This content is for informational purposes only and not legal advice; get professional legal advice for specific guidance.)
For more on ethical data collection, check out our ethical & compliant web data benchmark.
Is web scraping legal?
Web scraping is legal if you scrape publicly available data on the web. However, the legality of web scraping depends on how, what, and why you’re scraping.
Web scraping can be legal when you:
- Scrape publicly available data from webpages that can be accessed without requiring a login, subscription, or payment.
- Respect the website’s terms of service, robots.txt file, and copyright laws.
- Don’t collect personal or sensitive data, such as names or contact information, in a manner that violates privacy laws, including the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
History of major web scraping lawsuits
Though web scraping can be legal, being scraped is not desired by companies. If these platforms can show that being scraped by a bot damages their infrastructure or operations, then that activity may be found illegal by the court.
Here, we have compiled the most significant lawsuits in which the court sided with the scraped website; these cases, especially from the U.S.
Linkedin vs hiQ Labs Case
Court: U.S. District Court / Ninth Circuit Court of Appeals
Timeline: 2017–2022
LinkedIn sued hiQ Labs, a data analytics company, for scraping publicly available profiles to conduct a professional skill analysis.1 Several courts, including the Supreme Court, reviewed the case:
- The court initially sided with hiQ, ruling that scraping public data does not violate the Computer Fraud and Abuse Act (CFAA).2
- In 2022, the Ninth Circuit reaffirmed this, stating that accessing publicly available data without authorization is not “unauthorized access” under CFAA.
The court ruled that LinkedIn’s actions to block hiQ were lawful. Despite CFAA considerations, breaching a website’s terms of service can result in legal consequences. hiQ’s violations of LinkedIn’s user agreement played a significant role in the final judgment.
Meta vs Bright Data
Court: U.S. District Court for the Northern District of California
Timeline: 2023–2024
Case Type: Civil lawsuit involving breach of contract and unauthorized data scraping
In January 2023, Meta initiated a lawsuit against Bright Data, alleging that it had illegally extracted data from Meta’s Facebook and Instagram platforms. Interestingly, Bright Data contested Meta’s claims about its data scraping rights, leading both parties to court.
The court ruled in favor of Bright Data, determining that there was insufficient evidence to prove that Bright Data had scraped non-public data or accessed data while logged into user accounts. Meta decided to drop the remaining claims against Bright Data in February 2024.3
X Corp., formerly Twitter vs Bright Data
Court: U.S. District Court for the Northern District of California
Timeline: 2023–ongoing
Case Type: Unauthorized data access under computer fraud statutes, intellectual property violations
In July 2023, X Corp. filed a lawsuit against Bright Data, alleging that Bright Data violated its terms of service by scraping and selling vast amounts of data from the X platform. 4 The legal action in California was about Bright Data’s access to public data on Twitter.
The case was dismissed, and the judge ruled that X failed to allege that Bright Data had violated its user agreement plausibly. The court held that terms of service could not prevent data scraping since X Corp was not the owner of the content and therefore could not enforce its copyright.
Owning user content would invalidate X Corp’s safe harbor protection, which enables social media companies to distance themselves from copyright infringement and other crimes committed by their users. Therefore, courts again ruled in favor of a party that collected public data from a social network.
eBay vs Bidder’s Edge Case
Court: United States District Court for the Northern District of California
Timeline: 1999–2000
Case Type: The case was a civil lawsuit over trespass to chattels, with eBay accusing Bidder’s Edge of unlawfully scraping its site using automated data collection bots.
Bidder’s Edge (BE), an online price comparison website, used web scraping tools to aggregate auction listings from various platforms, including eBay, without permission. 5 eBay claimed that BE’s automated bots caused unauthorized use of its systems.
The court order was preventing Bidger’s Edge from scraping eBay content again. The main argument eBay won over was that Bidger’s Edge was exhausting their system, and others following Bidger’s Edge could cause more harm to eBay’s system.
Facebook vs Power Ventures Case
Court: U.S. District Court for the Northern District of California
Later appealed to the U.S. Court of Appeals for the Ninth Circuit
Timeline: 2008–2017
Case Type: The case was a civil lawsuit under the CFAA and California’s anti-hacking law, with Facebook alleging unauthorized access to its platform.
In 2009, Facebook sued Power Ventures for scraping content from its users’ uploaded websites. This set example is for a case where web scraping was evaluated from an intellectual property standpoint. The court sided with Facebook and ordered a fiscal penalty for Power Ventures.6
Latest regulations of Web Scraping by Country
United States
Legal Status: The web scraping of publicly available data is generally considered legal.
There are no federal laws against web scraping in the United States as long as the scraped data is publicly available and the scraping activity does not harm the website being scraped. There is one specific act from 2016 against purchasing an excessive number of tickets at once using bots to prevent black markets.7
European Union and the UK
Legal Status: In the EU and UK, web scraping of publicly available, non-personal, and non-copyrighted content is generally legal, but scraping personal data without a lawful basis is prohibited under GDPR.
The EU recently passed the Digital Services Act, which aims to bring all EU countries under the Digital Single Market, sharing the same regulations. According to Articles 3 and 4 of this regulation, “reproduction of publicly available content” is not illegal.8 9
This regulation approaches the topic from an intellectual property perspective, and, needless to say, would deem any web scraping involving personal data illegal under the GDPR. Apart from it, the situation is similar to the US in the EU markets and the UK.
Dos and don’ts of legal and ethical web scraping
From a legal standpoint, one question businesses should ask themselves is whether their scraping acts harm the scraped website. If the scraping activity:
- It is too intense, which can interrupt the services of the scraped website
- The scraped data is used to duplicate the activity or the service of that website, even though regulations don’t exist.
The website would have grounds to file a lawsuit against the scraper.
From an ethical standpoint, given that web scraping already has many use cases and professional providers in the market, we can claim that there is no shame in using web scraping for business purposes. There are technical web scraping best practices that will ease the traffic load on the scraped website, such as:
- Using the website’s APIs rather than web scraping, when available.
- Integrating web scrapers with proxy servers.
- Using headless browsers.
As long as you find a trusted web scraper to work with or make sure your technical resources consider these, you can defend your web scraping as ethical for your business purposes.
Dos:
- Scrape only the data you need by determining the exact business case and customizing your web crawler technology for it. This will minimize your risk of exhausting the scraped website with unwanted traffic.
- Always read the terms of use of the scraped website. Apart from commercial terms of use, websites also have a robots.txt file, which contains information about the permissions for the website’s content. Your web crawling solution or technical experts should assist you in adhering to these permissions.
- Be transparent about your web scraping and be ready to explain your scraping process to assure others that your approach is legal and ethical.
Don’ts:
- Do not exhaust the scraped website too often and with too extensive pulls. This will also increase the likelihood that the scraped website will block your crawler.
- Do not collect personally identifiable information, or if robot.txt allows you to collect it, ensure that you mask the data to minimize exposure during processing.
- Do not expose the scraped data to the public. Make sure that it is stored securely, just like your own company data. You never know for what purposes it may be used if leaked.
Sponsored
When partnering with a service provider for web scraping, ensure you leverage their technical expertise and legal experience. For example, Bright Data dedicates a compliance officer to its customers to ensure they have no questions about the legal processes of web scraping along the way.
Organizations for Ethical Web Scraping
Leading web data infrastructure companies have formed associations to align their industry and stakeholders on the ethical use of web scraping. These associations are:
- Alliance for Responsible Data Collection, which includes Bright Data and Common Crawl among other stakeholders.
- Ethical Web Data Collection Initiative (EWDCI), which includes Oxylabs, NetNut, ProxyEmpire, Zyte, among others.
What if a website’s terms of service forbid scraping?
If a website’s terms of service (ToS) explicitly prohibit scraping, accessing, or collecting data from that site through automated means, doing so may constitute a violation of those terms.
For instance, in the United States, unauthorized access to a computer system can be a federal offense under the Computer Fraud and Abuse Act (CFAA). You can contact the site owner to request permission or use official APIs for data access.
External Links
- 1. hiQ Labs v. LinkedIn - Wikipedia. Contributors to Wikimedia projects
- 2. Web scraping is legal, US appeals court reaffirms | TechCrunch. TechCrunch
- 3. Meta, which pays for web scraping, sues to stop web scraping • The Register. The Register
- 4. California Federal Court Holds X’s Claims Against Scraper Preempted by Federal Law | Socially Aware.
- 5. https://en.wikipedia.org/wiki/EBay_v._Bidder’s_Edge#Order
- 6. Facebook, Inc. v. Power Ventures, Inc. - Wikipedia. Contributors to Wikimedia projects
- 7. S.3183 - 114th Congress (2015-2016): BOTS Act of 2016 | Congress.gov | Library of Congress.
- 8. https://digital-strategy.ec.europa.eu/en/policies/digital-services-act-package
- 9. https://www.europarl.europa.eu/legislative-train/theme-connected-digital-single-market/file-jd-directive-on-copyright-in-the-digital-single-market
Comments
Your email address will not be published. All fields are required.