AIMultiple web data collection benchmark analyzes leading web unblocker and proxy service providers including residential proxy networks. We measure:
- Effectiveness as measured by success rate
- Scalability which is critical for high scale scraping operations. This depends on
- Scraping time per page
- Variation in scraping time per page.
How to choose the right proxy for your web data task?
Reliable web data collection
If you aim to get all the pages in a large (>10,000 URLs) target URL list which includes URLs that leverage anti-scraping techniques: Work with market leading proxy companies like Bright Data, Nimble, Oxylabs and Smartproxy that offer both unblocker services and proxies.
You can use these leading companies’
- Proxies (e.g. datacenter proxies) for easy-to-scrape web domains
- Web unblockers to reach pages that were not successfully retrieved by residential proxies. For example, unblockers are necessary for domains that employ anti-scraping techniques.
Cost-effective web data collection
If you are targeting easy-to-crawl websites, are OK with not getting results for up to 20% of the pages in your first try and if you are not concerned with speed or scalability: Leverage any proxy provider based on their pricing. For more, check out AIMultiple’s proxy pricing guide.
Large-scale web data collection
If you require fresh data at large volumes and therefore care about speed & scalability: Reach out to the fastest providers in the scalability section.
Finally, if you are part of a technical team with experience in web scraping that is working with very large volume scraping tasks, you can consider buying residential proxies and writing code to overcome anti-scraping challenges. You could either work with the request frequency and parameters to reduce the possibility of CAPTCHA tests or build systems and processes to overcome CAPTCHA.
Enterprise web data collection
This assessment is focused on effectiveness and scalability. If you are an enterprise user, you should also evaluate these vendors from a governance, risk and compliance (GRC) perspective ensuring that they
- Have the systems, processes and people to secure your data
- Have acceptable use policies in-place to ensure that
- your web data collection efforts are compliant
- they are not supporting illegal activities
- Work with ethical and legally compliant suppliers
AIMultiple is working on a benchmark for ethical and compliant web data collection for enterprises. If this is important for you, please reach out via info at aimultiple.com or connect to AIMultiple’s Principal Analyst.
What are the latest benchmark results?
Results from the 2024 Q4 benchmark are listed below. For definitions, see methodology.
Response times
For a quick overview, see how different operators compare in terms of response times:
- Gray lines in the middle: median response time
- Boxes: Variability in response time; denotes the time between upper and lower quartile response times. Therefore the box includes 50% of responses.
- The gray lines at the extremes (i.e. whiskers) are drawn at a distance of 1.5 IQRs (i.e. interquartile range) from the box edges
Success rates
Success rates depend on numerous parameters apart from the proxy provider. We explained this in success rates of proxy providers.
No statistically significant differences emerged between proxies shared by the different providers.
We analyzed 8 batches of requests made to Amazon domains in our URL set during December 2024 and January 2025. By sampling 10 consecutive requests to each of the 7 Amazon domains totaling 150 batches with 70 requests in each batch, we calculated the expected success rate range of a random batch with 95% confidence. The range for expected success rate of these batches were so high that there was no statistically significant difference between the products in terms of success rate.
Pricing
How did benchmark results evolve?
Results from 2023 and 2024 benchmarks have been relatively consistent if we consider that participants increased from 5 to 9 in 2024 and that AIMultiple introduced one more verification in 2024.
- Regardless of the brand, residential proxies either work or don’t work for a specific site based on that website’s anti scraping approach:
- For web sites which heavily rely on anti-scraping techniques, the success rate was:
- ~0.2% on average for all 9 providers with insignificant differences between different providers in 2024 Q1.
- ~0% (just a handful of successful results out of 1,700 pages) for all 5 providers in 2023 Q4.
- For lightly protected websites, success rate ranged from:
- 83-94% in 2024 Q1. This is lower than the values in 2023 because AIMultiple’s new methodology introduced one more verification to ensure the correctness of successful results.
- 90 to 99% in 2023 Q4.
- For web sites which heavily rely on anti-scraping techniques, the success rate was:
- There are significant speed differences between brands in parameters that impact scalability.
- Average response time ranges from:
- 2 to 15 seconds in 2024 Q4.
- 3 to 7 seconds in 2024 Q1.
- 2 to 6 seconds in 2023 Q4.
- Standard deviation of response time ranges from:
- 2 to 17 seconds in 2024 Q4.
- 2 to 8 seconds in 2024 Q1.
- 1 to 7 seconds in 2023 Q4.
- Average response time ranges from:
What are the best residential proxy / unblocker providers?
Effectiveness
Effectiveness is measured by success rate: The percentage of connection requests that are successfully processed by the proxy server (i.e. returns a 200 response code). This is measured only for pages that don’t employ anti-scraping measures. Residential proxies were not successful against pages that employ such techniques.
In addition, unrealistically small files are assumed to be unsuccessful. Providers sometimes return results
- With a 200 response code
- That don’t include the full page.
For example, some results with 200 response code include error messages about why the page was not reached. These error messages take up significantly less space than successful results. Therefore, the result with the smallest file size is assumed to be wrong if the next larger result is 20+% larger than the smallest result. 20% threshold was decided by manual sampling.
Residential proxy
2024 Q4 results
For all benchmarked URLs, residential proxies were effective in 21-37% of the pages depending on the provider.
2024 Q1 results
Success rate in
- Lightly-protected websites: all benchmarked providers achieved 83-94% success rate.
- Highly-protected websites: Most pages were not crawlable.
2023 Q4 results
Success rate was the percentage of connection requests that were successfully processed by the proxy server (i.e. returns a 200 response).
In terms of success rate in lightly protected websites, all benchmarked providers achieved >90% success rate and AIMultiple found the results to be not differentiated enough to share on a vendor-by-vendor basis.
Web unblocker
Effectiveness is measured in the same manner as effectiveness of residential proxies.
2024 Q4 results
Benchmarked providers achieved 52-98% accuracy on average.
2024 Q1 results
All benchmarked providers achieved 93-94% accuracy on average across the 6 domains which included both easy-to-scrape domains and domains with anti-scraping measures.
Scalability
The graph is like the response times graph from our latest benchmark:
Sorting: Product with shortest average response time is at the top.
AIMultiple measures these scalability metrics:
- Average time to successful response (ATSR): The average time it takes to obtain a successful response from the proxy network, measured in seconds.
- Standard deviation of successful response time (SDSRT): Indicates the variability of the response time in seconds. A lower standard deviation reflects more consistent proxy performance.
- Upper bound for successful response: ATSR + SDSRT equals the upper bound for the time it takes to return a result 68% of the time assuming a normal distribution of response times.
Residential proxies
2024 Q1 results
Upper bound for successful response varied between 5 to 14 seconds. Oxylabs was the fastest provider with 5 seconds. The values for the other providers ranged from 8 to 14 seconds.
2023 Q4 results
Smartproxy and Oxylabs led the pack in terms of average time to extract data and the standard deviation of time to extract data.
Web unblocker
2024 Q1 results
Upper bound for successful response varied between 6 to 28 seconds. Nimble was the fastest provider with 6 seconds. The values for the other providers ranged from 6 to 28 seconds.
Methodology
2024-Q4
Participants: AIMultiple team benchmarked 24 proxy services including all proxy types (e.g. datacenter, mobile) from 7 different providers, focusing only on those that use rotating proxies from an IP pool.
Proxies offering dedicated IPs were excluded to ensure a fair comparison between the options, as determining the right number of dedicated IPs to use for a balanced comparison with rotating proxies can be complex.
Scope: 5,000 URLs from 25 distinct domains were used.
Each proxy type from each provider was used to send synchronous GET requests to the selected URLs, with the response time and returned content being recorded.
A successful request was defined as one where the HTTP response code was 200 and the returned content was of a reasonable size. The size threshold was determined by comparing the content sizes of different responses. If a response was smaller than half the average response size, it was classified as incorrect or missing. This approach was validated by cross-checking with the actual responses, and it was found to reliably identify most unsuccessful responses that still returned a 200 HTTP code.
Timing: Data collection occurred in November 2024.
2024-Q1
AIMultiple team leveraged each proxy brand to extract data from pre-selected URLs. Requests were processed sequentially. 9 proxies and 5 web unblockers were benchmarked. Participants:
- Web unblocker providers: Bright Data, Nimble, Oxylabs, Smartproxy.
- Proxy providers included all web unblocker providers and IPRoyal, NetNut, ProxyEmpire, Rayobyte, Webshare.
8400 URLs were used from the same domains as in 2023. The same URLs were used for both web unblocker and residential proxy.
The data collection occurred in March and April 2024.
2023
Participanting proxy providers: IPRoyal, Oxylabs, Rayobyte, Smartproxy, Webshare.
Benchmark was focused only on residential proxies. Target URL set included 10,200 URLs from 6 websites in these domains:
- E-commerce: 3
- Travel: 1
- HR: 1
- Real estate: 1
The data collection occurred in November 2023.
What is missing in proxies and web unblockers?
The easy button. You have 2 options to consistently collect web data:
- Use web unblockers for every URL but unblockers are more expensive than proxies.
- For each domain, start from the cheapest solution (i.e. data center proxies), going all the way to unblockers to find the most cost-effective approach that works for each web domain that you are scraping. This is cost effective but requires development effort.
The second approach makes sense for larger volumes but why do proxy providers make each team build the same function for the second approach? They constantly scrape domains for clients and can monitor results anonymously to identify the most cost-effective tool for every web domain and update this over time as domains implement more anti-scraping measures.
This would save users time or money and increase customer retention for the provider that delivers this functionality.
Limitations and next steps
Effectiveness measurement
Success rate was measured based on these 2 factors:
- The message code returned by the proxy provider.
- The size of the proxy provider’s response
An anti-scraping technique is to return successful codes to crawlers while sharing slightly incorrect information. AIMultiple’s benchmark ignored this technique since it was challenging to verify correctness of webpage data from ~10k pages.
For unsuccessful crawl attempts via proxies and unblockers, no more attempts were tried. For example, a user could then try to scrape the same page again using the unblocker. AIMultiple’s next benchmark may incorporate multiple attempts to scrape data correctly.
Scope
Scrapers or data sets were not part of the benchmark. They can be used to access similar data. Scraper benchmarks are available in scraper-focused pages:
Proxy server FAQ
1. What is a proxy?
A proxy server is an intermediary between a user’s device and the target website. When you use a proxy server, your internet traffic is masked by the proxy server’s IP address before reaching its destination.
2. What are the different types of proxies?
Proxies are typically classified into two main categories: residential and datacenter proxies.
- Residential proxies: Residential proxies are associated with an IP address provided by an Internet Service Provider (ISP).
- Static residential proxies (ISP): ISP proxies are residential IP addresses that remain consistent over time.
- Mobile proxies: These are residential proxies that use IP addresses assigned to mobile devices, such as phones and tablets, by mobile network operators.
- Datacenter proxies: Datacenter IP addresses are provided by data centers rather than internet service providers.
3. Are proxies legal?
Utilizing a proxy server is not inherently illegal. However, the legality of using proxy servers depends on how they are used, the laws in your specific country.
4. How do I choose the right proxy provider?
It’s essential to determine the type of proxy suitable for your requirements. For instance, datacenter proxies typically provide faster speeds compared to residential proxies, making them suitable for high-speed tasks. If selecting IPs from particular countries is important, verify that the proxy provider supplies proxies in those regions. Consult user reviews on third party review platforms for unbiased opinions. Additionally, see if the provider allows a trial period or offers a money-back guarantee, enabling you to evaluate their service before fully committing.
5. What is the difference between a VPN and a proxy?
VPNs (Virtual Private Networks) and proxies both serve to route internet traffic through a server, concealing your IP address in the process. VPNs provide encryption for all the data transmitted between your device and the VPN server, including your browsing history. In contrast, a proxy server only hides your IP address and redirects your requests. While a VPN encrypts and routes all your internet traffic, not just limited to your browser, a proxy only redirects traffic from specific applications or your browser. Generally, proxies offer faster speeds compared to VPNs.
Transparency statement
AIMultiple serves numerous emerging tech companies, including Smartproxy and Oxylabs.
More on proxy server
- Top 10 Proxy Service Providers for Web Scraping
- 6 Best Reliable Proxy Providers: Selecting the Best
- Top 7 Premium Proxies: A Comprehensive Review
If you need help finding a vendor or have any questions, feel free to contact us:
Comments
Your email address will not be published. All fields are required.