Is Web Scraping Legal? Ethical Web Scraping Guide in 2024

Updated on Jan 5

5 min read

Table of contents

1. First things first: Is web scraping legal?2. History of major web scraping lawsuits 3. Latest regulations of Web Scraping by Country 4. Dos and Don’ts of Legal and Ethical Web Scraping Further Reading:

If you are scraping web, you’ve probably already seen how it benefited your business. If your website is being scraped, then you may be angry with web scraping tools using your server resources and your information being used for others’ benefit. You may ask:

Is it legal?
Can your specific use case violate the rules?
Even if legal, is it ethical?
Would it harm your business’ reputation?

In this article, we will give you a short summary of major web scraping lawsuits, the latest legal status by country and common do’s and don’ts of web scraping to use it in a legal and ethical way.

Please note that this article is for informational purposes and should not be taken as legal advice. For your scraping projects, you are advised to get specific legal advice.

1. First things first: Is web scraping legal?

Short answer is, yes. Scraping publicly available information on the web in an automated way is legal as long as the scraped data is not

Used for any harmful purpose.
Used to directly harm the scraped website’s business or operations.
Including Personally identifiable information (PII). There are data protection regulations around PII in many countries, the major ones being GDPR in EU and CCPA in California. There are no federal regulations about that in the US yet, but combination of different laws and state-level regulations often protect PII at federal level. Therefore, it is important not to scrape personally identifiable information or even if scraped, businesses can mask and protect it with data enhancing technologies.

2. History of major web scraping lawsuits

Though web scraping can be legal, being scraped is not desired by companies. If these platforms can show that being scraped by a bot damages their infrastructure or operations, then that activity may be found illegal by the court. Here, we collected the most significant lawsuits where the court sided with the scraped website. Businesses should keep in mind that without an overarching law, similar cases to below may not result with the same court decision given that each one is evaluated on a case by case basis.

Meta vs Bright Data Case: Meta Platforms initiated a lawsuit against Bright Data, accusing it of illegally extracting data from its Facebook and Instagram platforms. In response, Bright Data contested Meta’s claims about its data scraping rights, leading both parties to court. While Meta aims to stop Bright Data’s data collection activities, Bright Data seeks a court declaration to affirm the legality of harvesting public data from Facebook. ¹. X Corp., formerly Twitter, has recently launched a legal action in California against Bright Data, an Israeli company specializing in web scraping services. Or Lenchner, the CEO of Bright Data, commented to Bloomberg Law that this lawsuit represents an attempt to restrict access to publicly available data on Twitter. ².
eBay vs Bidder’s Edge Case: One of the earliest publicly known web scraping lawsuit was opened by eBay on EBidger, an online price comparison website for consumers in 2000. The court order was preventing Bidger’s Edge to scrape eBay content again. The main argument eBay won over was that Bidger’s Edge exhausting their system and others following Bidger’s Edge could cause more harm to eBay’s system.
Facebook vs Power Ventures Case: In 2009, Facebook sued Power Ventures for scraping content from its websites that its users uploaded. This set example for a case where web scraping was evaluated from intellectual property standpoint. The court sided with Facebook and ordered a fiscal penalty for Power Ventures.
Linkedin vs hiQ Labs Case: The most recent major web scraping case started in 2019. Linkedin sued hiQ Labs, a data analytics company that scraped publicly available profiles for a professional skill analysis. The case was reviewed by several courts including the Supreme Court and scraping data that is publicly accessible on the internet was judged to be legal.

3. Latest regulations of Web Scraping by Country

United States: There are no federal laws against web scraping in the United States as long as the scraped data is publicly available and the scraping activity does not harm the website being scraped. There is one specific act from 2016 against purchasing an excessive number of tickets at once using bots to prevent black markets.

European Union and the UK: EU recently has passed Digital Services Act, which aims to bring all EU countries under Digital Single Market sharing same regulations. According to Article 3 and 4 of this regulation, “reproduction of publicly available content” is not illegal. This regulation approaches the topic more from intellectual property point of view, and needless to say, would find any web scraping involving personal data illegal due to GDPR. Apart from it, the situation is similar to the US in EU markets and the UK.

China: Within sources in English, there is no direct regulation against web scraping in China too. Similar to other countries, it seems like web scraping is used in China for business use cases as well and it is not legal to scrape and process personal data.

4. Dos and Don’ts of Legal and Ethical Web Scraping

From legal standpoint, one question businesses should ask themselves is whether their scraping act harm the scraped website. If the scraping activity is too intense which can interrupt the services of the scraped website or the scraped data is used in a way to duplicate the activity or the service of that website, then even though regulations don’t exist, the website would have grounds to file a lawsuit against the scraper.

From an ethical standpoint, given that web scraping already has many use cases and professional providers in the market, we can claim that there is no shame in using web scraping for business purposes. There are technical web scraping best practices that will ease the traffic load on the scraped website, such as:

Using website’s APIs rather than web scraping, when available.
Integrating web scrapers with proxy servers.
Using headless browsers.

To learn more about how to improve your web scraping projects, check out top 7 web scraping best practices.

As long as you find a trusted web crawler to work with or make sure your technical resources take these into consideration, you can defend your web scraping being ethical for your business purposes.

Dos:

Scrape only the data you need by determining the exact business case and customizing your web crawler technology for it. This will minimize your risk of exhausting the scraped website with unwanted traffic.
Always read the terms of use of the scraped website. Apart from commercial terms of use, websites also have a robot.txt file which includes information about the permissions of the scraped website. Your web crawling solution or technical experts should help you with abiding by those permissions.
Be transparent about your web scraping and be ready to explain your scraping process to assure others that your approach is legal and ethical.

Don’ts:

Do not exhaust the scraped website with too often and extensive pulls. This will also increase the likelihood that your crawler will be blocked by the scraped website.
Do not collect personally identifiable information or if you obtain permission by the robot.txt to collect it, make sure to mask the data to minimize exposure at processing.
Do not expose the scraped data to public. Make sure that it is stored securely just like your own company data. You never know for what purposes it may be used if leaked.

Sponsored:

If you partner with a service provider for web scraping, make sure to leverage their technical expertise and legal experience. For example, Bright Data dedicates a compliance officer to their customers to make sure they don’t have any questions in mind about the legal processes of web scraping along the way.

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.