AIMultiple ResearchAIMultiple Research

Unblocker& Proxy Benchmark: 9 Vendors vs ~10k URLs in 2024

Updated on May 23
7 min read
Written by
Cem Dilmegani
Cem Dilmegani
Cem Dilmegani

Cem is the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per Similarweb) including 60% of Fortune 500 every month.

Cem's work focuses on how enterprises can leverage new technologies in AI, automation, cybersecurity(including network security, application security), data collection including web data collection and process intelligence.

View Full Profile
Researched by
Gulbahar Karatas
Gulbahar Karatas
Gulbahar Karatas
Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.

She is a frequent user of the products that she researches. For example, she is part of AIMultiple's web data benchmark team that has been annually measuring the performance of top 9 web data infrastructure providers.

She previously worked as a marketer in U.S.
View Full Profile

AIMultiple web data collection benchmark provides an in-depth analysis of nine web unblocker and proxy service providers. Our evaluation focuses on residential proxy networks and web unblockers covering:

  • Effectiveness as measured by success rate
  • Scalability which is critical for high scale scraping operations as measured by upper bound for a successful response. This depends on
    • Scraping time per page
    • Variation in scraping time per page.

How to choose the right proxy for your web data task?

If you aim to get all the pages in a large (>10,000 URLs) target URL list which includes URLs that leverage anti-scrape techniques: Work with market leading proxy companies like Bright Data, Nimble, Oxylabs and Smartproxy that offer both unblocker services and proxies.

You can use these leading companies’

  • Proxies for easy-to-scrape web domains
  • Web unblockers to reach pages that were not successfully retrieved by residential proxies
  • Web unblockers on domains that employ anti scraping techniques and finish your web data collection.

If you are targeting easy-to-crawl websites, are OK with not getting results for up to 20% of the pages in your first try and if you:

  • are not concerned with speed or scalability: you can leverage any proxy provider based on their pricing. For more, check out AIMultiple’s proxy pricing guide.
  • require fresh data at large volumes and therefore care about speed & scalability: Reach out to the fastest providers in the scalability section.

Finally, if you are part of a technical team with experience in web scraping that is working with very large volume scraping tasks, you can consider buying residential proxies and writing code to overcome anti-scraping challenges. You could either work with the request frequency and parameters to reduce the possibility of CAPTCHA tests or build systems and processes to overcome CAPTCHA.

Please note that this assessment purely focuses on effectiveness and scalability. If you are an enterprise user, you should also evaluate these vendors from a governance, risk and compliance (GRC) perspective ensuring that they

  • Have the systems, processes and people to secure your data
  • Have acceptable use policies in-place to ensure that
    • your web data collection efforts are compliant
    • they are not supporting illegal activities
  • Work with ethical and legally compliant suppliers

AIMultiple is working on a benchmark for ethical and compliant web data collection for enterprises. If this is important for you, please reach out via info at aimultiple.com or connect to AIMultiple’s Principal Analyst.

What are the benchmark results?

Results from 2023 and 2024 benchmarks have been relatively consistent if we consider that participants increased from 5 to 9 in 2024 and that AIMultiple introduced one more verification in 2024.

  • Regardless of the brand, residential proxies either work or don’t work for a specific site based on that website’s anti scraping approach:
    • For web sites which heavily rely on anti-scraping techniques, the success rate was:
      • ~0.2% on average for all 9 providers with insignificant differences between different providers in 2024.
      • ~0% (just a handful of successful results out of 1,700 pages) for all 5 providers in 2023.
    • For lightly protected websites, success rate ranged from:
      • 83-94% in 2024. This is lower than the values in 2023 because AIMultiple’s new methodology introduced one more verification to ensure the correctness of successful results.
      • 90 to 99% in 2023
  • There are significant speed differences between brands in parameters that impact scalability.
    • Average response time ranges from:
      • 3 to 7 seconds in 2024
      • 2 to 6 seconds in 2023
    • Standard deviation of response time ranges from:
      • 2 to 8 seconds in 2024
      • 1 to 7 seconds in 2023

What are the best residential proxy / unblocker providers?

Effectiveness

Residential proxy

Effectiveness is measured by success rate: The percentage of connection requests that are successfully processed by the proxy server (i.e. returns a 200 response code). This is measured only for pages that don’t employ anti-scraping measures. Residential proxies were not successful against pages that employ such techniques.

In addition, unrealistically small files are assumed to be unsuccessful. Providers sometimes return results

  • With a 200 response code
  • That don’t include the full page.

For example, some results with 200 response code include error messages about why the page was not reached. These error messages take up significantly less space than successful results. Therefore, the result with the smallest file size is assumed to be wrong if the next larger result is 20+% larger than the smallest result. 20% threshold was decided by manual sampling.

In terms of success rate in lightly protected websites, all benchmarked providers achieved 83-94% success rate.

Since the results are relatively close, AIMultiple is not sharing exact percentages. It may not make a major difference if a brand achieved 83% success rate while another achieved 93%. In both cases, another crawl of unsuccessful pages is necessary.

2023 results

Success rate was the percentage of connection requests that were successfully processed by the proxy server (i.e. returns a 200 response).

In terms of success rate in lightly protected websites, all benchmarked providers achieved >90% success rate and AIMultiple found the results to be not differentiated enough to share on a vendor-by-vendor basis.

Web unblocker

Effectiveness is measured in the same manner as effectiveness of residential proxies.

All benchmarked providers achieved 93-94% accuracy on average across the 6 domains which included both easy-to-scrape domains and domains with anti-scraping measures.

Scalability

AIMultiple measures these scalability measures:

  • Average time to successful response (ATSR): The average time it takes to obtain a  successful response from the proxy network, measured in seconds.
  • Standard deviation of successful response time (SDSRT): Indicates the variability of the response time in seconds. A lower standard deviation reflects more consistent proxy performance.
  • Upper bound for successful response: ATSR + SDSRT equals the upper bound for the time it takes to return a result 68% of the time assuming a normal distribution of response times.

Residential proxies

Upper bound for successful response varied between 5 to 14 seconds. Oxylabs was the fastest provider with 5 seconds. The values for the other providers ranged from 8 to 14 seconds.

2023 results

Smartproxy and Oxylabs led the pack in terms of average time to extract data and the standard deviation of time to extract data.

Proxy providerAverage Successful Response Time (s)Standard Deviation of Successful Response Time (s)
Oxylabs21
Smartproxy22
Webshare46
IPRoyal45
Rayobyte67

Web unblocker

Upper bound for successful response varied between 6 to 28 seconds. Nimble was the fastest provider with 6 seconds. The values for the other providers ranged from 6 to 28 seconds.

Methodology

2024

AIMultiple team leveraged each proxy brand to extract data from pre-selected URLs. Requests were processed sequentially. 9 proxies and 5 web unblockers were benchmarked. Participants:

  • Web unblocker providers: Bright Data, Nimble, Oxylabs, Smartproxy.
  • Proxy providers included all web unblocker providers and IPRoyal, NetNut, ProxyEmpire, Rayobyte, Webshare.

8400 URLs were used from the same domains as in 2023. The same URLs were used for both web unblocker and residential proxy.

The data collection occurred in March and April 2024.

2023

Participanting proxy providers: IPRoyal, Oxylabs, Rayobyte, Smartproxy, Webshare.

Benchmark was focused only on residential proxies. Target URL set included 10,200 URLs from 6 websites in these domains:

  • E-commerce: 3 
  • Travel: 1
  • HR: 1
  • Real estate: 1

The data collection occurred in November 2023.

What is missing in proxies and web unblockers?

The easy button. You have 2 options to consistently collect web data:

  1. Use web unblockers for every URL but unblockers are more expensive than proxies.
  2. For each domain, start from the cheapest solution (i.e. data center proxies), going all the way to unblockers to find the most cost-effective approach that works for each web domain that you are scraping. This is cost effective but requires development effort.

The second approach makes sense for larger volumes but why do proxy providers make each team build the same function for the second approach? They constantly scrape domains for clients and can monitor results anonymously to identify the most cost-effective tool for every web domain and update this over time as domains implement more anti-scraping measures.

This would save users time or money and increase customer retention for the provider that delivers this functionality.

Limitations and next steps

Effectiveness measurement

Success rate was measured based on these 2 factors:

  • The message code returned by the proxy provider.
  • The size of the proxy provider’s response

An anti-scraping technique is to return successful codes to crawlers while sharing slightly incorrect information. AIMultiple’s benchmark ignored this technique since it was challenging to verify correctness of webpage data from ~10k pages.

For unsuccessful crawl attempts via proxies and unblockers, no more attempts were tried. For example, a user could then try to scrape the same page again using the unblocker. AIMultiple’s next benchmark may incorporate multiple attempts to scrape data correctly.

Use cases

Benchmark focused only on the e-commerce domain. Other use cases such as scraping Google SERP results could be included.

Scope

Scrapers or data sets were not part of the benchmark. They can be used to access similar data.

Proxy server FAQ

1. What is a proxy?

A proxy server is an intermediary between a user’s device and the target website. When you use a proxy server, your internet traffic is masked by the proxy server’s IP address before reaching its destination. 

2. What are the different types of proxies?

Proxies are typically classified into two main categories: residential and datacenter proxies

  • Residential proxies: Residential proxies are associated with an IP address provided by an Internet Service Provider (ISP).
    • Static residential proxies (ISP): ISP proxies are residential IP addresses that remain consistent over time. 
    • Mobile proxies: These are residential proxies that use IP addresses assigned to mobile devices, such as phones and tablets, by mobile network operators. 
  • Datacenter proxies: Datacenter IP addresses are provided by data centers rather than internet service providers. 

Utilizing a proxy server is not inherently illegal. However, the legality of using proxy servers depends on how they are used, the laws in your specific country. 

4. How do I choose the right proxy provider?

It’s essential to determine the type of proxy suitable for your requirements. For instance, datacenter proxies typically provide faster speeds compared to residential proxies, making them suitable for high-speed tasks. If selecting IPs from particular countries is important, verify that the proxy provider supplies proxies in those regions. Consult user reviews on third party review platforms for unbiased opinions. Additionally, see if the provider allows a trial period or offers a money-back guarantee, enabling you to evaluate their service before fully committing.

5. What is the difference between a VPN and a proxy?

VPNs (Virtual Private Networks) and proxies both serve to route internet traffic through a server, concealing your IP address in the process. VPNs provide encryption for all the data transmitted between your device and the VPN server, including your browsing history. In contrast, a proxy server only hides your IP address and redirects your requests. While a VPN encrypts and routes all your internet traffic, not just limited to your browser, a proxy only redirects traffic from specific applications or your browser. Generally, proxies offer faster speeds compared to VPNs.

Transparency statement

AIMultiple serves numerous emerging tech companies, including Smartproxy and Oxylabs.

More on proxy server

If you need help finding a vendor or have any questions, feel free to contact us:

Find the Right Vendors
Cem Dilmegani
Principal Analyst

Cem is the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per Similarweb) including 60% of Fortune 500 every month.

Cem's work focuses on how enterprises can leverage new technologies in AI, automation, cybersecurity(including network security, application security), data collection including web data collection and process intelligence.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Cem's hands-on enterprise software experience contributes to the insights that he generates. He oversees AIMultiple benchmarks in dynamic application security testing (DAST), data loss prevention (DLP), email marketing and web data collection. Other AIMultiple industry analysts and tech team support Cem in designing, running and evaluating benchmarks.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Sources:

AIMultiple.com Traffic Analytics, Ranking & Audience, Similarweb.
Why Microsoft, IBM, and Google Are Ramping up Efforts on AI Ethics, Business Insider.
Microsoft invests $1 billion in OpenAI to pursue artificial intelligence that’s smarter than we are, Washington Post.
Data management barriers to AI success, Deloitte.
Empowering AI Leadership: AI C-Suite Toolkit, World Economic Forum.
Science, Research and Innovation Performance of the EU, European Commission.
Public-sector digitization: The trillion-dollar challenge, McKinsey & Company.
Hypatos gets $11.8M for a deep learning approach to document processing, TechCrunch.
We got an exclusive look at the pitch deck AI startup Hypatos used to raise $11 million, Business Insider.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments