We crawled more than 30 million web pages using more than 50 products from 6 leading web data infrastructure companies. This massive undertaking enabled us to assess critical performance metrics, including success rates, latency, and stability at scale.
Our goal was to determine which solutions truly handle the complexities of enterprise-level scraping. Below, you will find the comprehensive analysis of the leading products based on our findings, followed by a complete roadmap to web scraping fundamentals.
Web data collection benchmark results
Vendor | API Coverage* | Unblocking Rate | Dynamic Scraper | Price** | Reliability |
|---|---|---|---|---|---|
89% | 98% | ✅ | 3.0 | High | |
53% | 96% | ❌ | 2.8 | Normal | |
37% | 95% | ✅ | 3.9 | High | |
63% | N/A | ❌ | 6.3 | Normal | |
Zyte | 32% | 97% | ✅ | 1.5*** | N/A*** |
NetNut | 11% | N/A*** | ❌ | 3.0 | Normal |
Notes on the benchmark table:
- (*) API Coverage: Represents the percentage of page types where a scraping API was available with a 90% or higher success rate.
- (**) Price: Prices are in thousands ($) for an Enterprise Proof of Concept (PoC) package. Prices are updated monthly based on public data.
- (***) Vendor Specifics: NetNut’s unblocker was not available for testing. Zyte’s API-based solution was not tested because load testing was conducted on residential proxies.
- Zyte does not offer proxies directly, but we assumed their proxies to be priced similarly to their API.
- Apify does not provide a web unblocker or mobile proxies; therefore, these products were assumed to be priced like its residential proxies.
Learnings from 30M web requests
Since the legality of collecting web data continues to be challenged, many businesses do not yet have a web data strategy and may not be aware of all solutions. Enterprises that need to collect web data typically value receiving structured, high-quality data with minimal technical effort via cost-effective, reliable services.
To achieve the goals above, enterprises need to:
- Outline the types of pages that they need to crawl
- Leverage web scraping APIs when they are available, since they minimize tech effort on the client side by providing structured data, and they are cost-effective. They cost about the same as residential proxies, even though residential proxies provide unstructured data.
Our experience: Before this benchmark, we relied on unblockers for our own company’s data collection needs. Our tech team was burdened every time our target websites changed their design. After realizing the scope of web scraping APIs and seeing that they are not more expensive than unblockers, we switched to using scraping APIs in our data collection workflows.
For the remaining pages, rely on:
- Web unblockers for hard-to-scrape pages, as they are the only solution that consistently returns successful results over 90% of the time without complex configuration. However, they are also the most expensive product in most providers’ toolkits.
- Datacenter or residential proxies for other pages if the enterprise’s tech team is comfortable with configuring proxies and maintaining these configurations to ensure high success rates.
- Mobile proxies for mobile responses, plus other proxies for more niche use cases.
Compare web data providers’ performance, price & reliability
In web-scraping APIs, you can choose:
- Bright Data for its market-leading range of web scraping APIs at cost-effective prices with detailed results. Many Bright Data SERP and e-commerce APIs return more data points than those of competitors.
- Apify for its market-leading range of web scraping APIs thanks to its community-driven scraper approach. However, success rates of some of its APIs were below our threshold for a successful API (i.e. below 90% success rate) and it was the most expensive provider in our benchmark.
- Zyte for its market-leading prices
- Others opportunistically (e.g. Decodo returned the most data points for Instagram posts).
In unblockers, leading products include:
- Bright Data is slightly more successful than most in real-world tests and significantly more successful in more difficult scenarios, such as scraping websites that regularly present JavaScript challenges. It also provides the second-lowest-priced unblocker in the benchmark.
- Zyte has the lowest-priced unblocker and the fastest unblocker, responding within ~2 seconds on average in real-world tests.
Learn more about web unblockers and see detailed results.
Proxies: You can rely on any of the providers based on your technical team’s preferences and pricing. This is because results vary significantly based on:
- Time: While publishers improve their anti-scraping measures, web data infrastructure providers continually receive new IPs and refine their approaches. We used the same proxy type from the same provider on the same website with the same configuration for thousands of URLs in different runs. There were runs where almost all responses were correct and some where the success rate was ~50%. The success rate depended on the test time.
- Request: Success of a request via a proxy depends on how the request is sent. For example, user-agent choice or the delay between requests significantly impacts the success rate.
As for reliability, all benchmarked providers’ services were reliable at 5,000 parallel requests. At 100,000 parallel requests, all services experienced some degradation, but Bright Data, Oxylabs, and Decodo exhibited greater reliability, showing minimal changes in success rate or response times.
Learn more about proxy providers and see detailed benchmark results.
However, this recommendation is not relevant in niche use cases. For example, a company not included in our benchmark could be providing higher-quality mobile proxies in Portugal. For niche cases, we recommend teams to experiment with different providers.
How to choose the right data collection solution
1. Enterprise web data requirements:
Enterprises include diverse businesses. For example, businesses with e-commerce operations and hedge funds require high volumes of data to feed their models (e.g. dynamic pricing, stock replenishment). Their requirements include:
- Buyer-related dimensions
- High volume
- Batch
- Price & quality sensitivity
- Want to receive structured data
- Website-related dimensions
- Easy & difficult-to-crawl
- Static and dynamic
- Mixed
To achieve these requirements, enterprises need:
- Capabilities to support their requirements:
- A wide selection of web scraping APIs that return detailed results with a high success rate to deliver structured data and satisfy their quality sensitivity. Measurement: Share of types of web pages to be crawled for which a web scraping API is provided. This would depend on the types of pages that each enterprise targets.
- A powerful unblocker for difficult-to-crawl websites. Measurement: Crawler’s success rate for a wide range of web pages, including the most challenging ones.
- Unblocker integration with browsers to enable interacting with websites for dynamic scraping. Measurement would include checking the availability or lack of this browser.
- Cost-effective services to satisfy their price sensitivity. For measurement, the price to crawl a set of web pages is measured.
- Reliability:
- A resilient web data infrastructure to handle high-volume batch queries. Measurement is based on how the success rate degrades during load testing. Most resilient networks should not experience drastic declines in success rates when answering tens of thousands of parallel queries.
2. Web data requirements for small, highly technical teams:
If your data collection costs will determine your company’s profitability, and if you are a highly technical team, we recommend relying on proxies to reduce costs.
Finally, all buyers should pay attention to pricing; therefore, we calculated prices for the same packages for all major web infrastructure providers:
See pricing methodology for details.
Dimensions of web data requirements
We are not covering every type of web data use case here. Many web data users have multiple one-off requests over time. That is not the focus of this report.
We have seen that enterprises typically have recurring web data needs to monitor sentiment, prices, or other rapidly changing metrics. Therefore, we have only focused on companies that continuously use web data. These dimensions are:
Buyer-related dimension
1. Volume:
- High volume, meaning 100 GB/month or more
- Low volume for any lower volume
2. Time sensitivity:
- Real-time: When web data, in raw or processed form, is served to human end users while they use applications, real-time responses are essential.
- Batch: Response times are not critical as long as results are received within tens of seconds. In most use cases, businesses batch process incoming web data to update their systems.
3. Quality sensitivity:
- Quality-sensitive: All web data solutions sometimes return empty responses when blocked by websites. Companies that want to spend limited time resending requests prefer solutions with higher success rates.
- Price-sensitive: Given that their other requirements are satisfied, these businesses want the lowest price and are willing to run their data collection systems multiple times to achieve higher-quality results.
- Price & quality sensitive: Businesses that want the optimal combination of high success rates and price.
4. Technical involvement:
- Want to build custom scrapers? The technical team is experienced in using proxies to bypass anti-scraping technologies and can create a custom internal solution. They are ready to devote effort to overcoming evolving anti-scraping approaches.
- Want to build HTML parsers: The technical team wants to receive HTML data to parse themselves. They are ready to reparse web pages continuously whenever the page design changes.
- Want to receive structured data: Team wants to receive structured data (e.g., JSON files) to integrate into their applications.
Website-related dimension:
5. Difficulty:
- Difficult-to-crawl websites like Amazon employ numerous anti-scraping technologies. Unblockers are necessary to receive data with high success rates from them consistently
- Easy-to-crawl websites can be crawled with proxies
- Easy & difficult-to-crawl websites
6. Interactivity:
- Static websites make up most of the web and deliver data via changes in the URL.
- Dynamic websites require users to use a mouse or keyboard to disclose additional information.
- Static and dynamic websites
7. Scraper availability:
- Available: A custom scraper exists for every webpage target type.
- Not available: There are no scrapers for any of the target webpage types.
- Mixed: For some targets, the scraper exists; for others, it doesn’t.
Methodology
This web data benchmark includes the benchmarks below, and the methodology for each benchmark is explained in its specific page:
- eCommerce scrapers
- Search engine scrapers
- Social media scrapers
- Web unblockers
- Large-scale web data collection
You can see the methodology for the pricing benchmark below:
Pricing methodology
Almost all prices are based on publicly disclosed packages.
However, not all vendors disclose pricing at the same levels. While one vendor may provide pricing for 100 GB of residential proxy usage, another may offer pricing for only 50 GB. In cases where their pricing was not public, if vendors share private pricing information with us, we include it in the benchmark, provided it does not change the ranking of vendors.
Our rationale is that we want to share:
- The most accurate pricing possible with our readers
- Pricing levels that are in line with the publicly available prices, which can be constantly monitored.
Unit conversions
For the same product, vendors may provide pricing in GB or in requests; we needed to convert these values between them.
We assume an average page size of ~400KB, based on our measurement of 1,700 e-commerce URLs. Therefore, we thought 1GB would equal 2.5k requests.
Packages
We looked into two packages: the enterprise PoC package and the enterprise package. The Enterprise PoC package is designed to be broadly representative of an enterprise PoC scope:
- 100 GB residential proxies
- 100 GB mobile proxies
- 500 GB datacenter proxies
- 500k unblocker requests
- 500k scraping API requests to Amazon product pages
The enterprise package is the highest-volume package with public pricing. In each product category, we identified the highest volumes offered by each provider and took the highest volume as the volume in the enterprise package for that product:
- 1,000 GB residential proxies
- 1,000 GB mobile proxies
- 5,000 GB datacenter proxies
- 2.5M unblocker requests
- 2.5M scraping API requests to Amazon product pages
Limitations
When enterprises procure such services at high volumes, they are likely to get discounts. Such enterprise discounts are not public and are not included in the benchmark.
Vendor-specific assumptions
Some vendors’ pricing is complex, which requires certain assumptions:
- Apify:
- For datacenter proxies, we assumed that the user buys a $499/month package and pays $0.25/GB for platform usage.
- For scrapers: We took the average price of these two scrapers: junglee~amazon-crawler and tri_angle~walmart-product-detail-scraper
- Oxylabs prices its unblocker on a GB-only basis. Therefore, we converted its pricing to a per-request model, assuming an average page size of ~400 KB.
- Zyte: The 4th pricing tier was recommended for the websites in our benchmark. We leveraged the HTTP response service.
Limitations and next steps
AIMultiple’s experience may differ from an average user’s experience in these cases: Users can
- Receive faster responses due to caching. Our work aimed to bypass caching in all providers to provide a level playing field.
- Receive fewer successful responses when extracting data from less popular websites since their requests may be blocked due to website health issues.
- Make configuration mistakes, miss KYC requirements, or get blocked when they initially send a high volume of requests. All of these can undermine their experience and success rates. Support teams can swiftly resolve all of these issues.
Finally, network quality will fluctuate over time, and this benchmark is a series of snapshots taken during a month. It should be representative for that month, but network quality can change after the benchmark.
Acknowledgements & disclaimers for transparency
All providers contributed to this benchmark by providing part or all of the credits used. We thank them for their support of our research.
All providers in this benchmark are AIMultiple customers. Our team ensures objectivity.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Be the first to comment
Your email address will not be published. All fields are required.