We follow ethical norms & our process for objectivity.

AIMultiple's customers in scraping tools include Bright Data, Oxylabs, Decodo, NetNut, Apify, Zyte.

Top remote browsers benchmark results

Success rate

Speed

Remote browser benchmark methodology

Features

Remote browser requirements for AI agent types

Challenges & mitigations

Top remote browsers benchmark results Success rate Speed Remote browser benchmark methodology Features Remote browser requirements for AI agent types Challenges & mitigations

Table of contents

Top remote browsers benchmark results Success rate Speed Remote browser benchmark methodology Features Remote browser requirements for AI agent types Challenges & mitigations

Scraping Tools

Updated on Jul 15, 2025

Remote Browsers: Web Infra for AI Agents Compared [2025]

Cem Dilmegani

with Ekrem Sarı

See our ethical norms

AI agents rely on remote browsers to automate web tasks without being blocked by anti-scraping measures. The performance of this browser infrastructure is critical to an agent’s success.

We benchmarked 8 providers on three key metrics: success rate, speed, and features. To do this, we executed 160 automated tasks, running 4 distinct scenarios 5 times for each service to measure their real-world performance.

Top remote browsers benchmark results

Here are the top remote browsers based on their capabilities and performance during our benchmark:

Updated at 07-15-2025

Provider	Overall	Success rate for browser automation	Speed	Features
Bright Data	97%	95%	100%	95%
BrowserAI	87%	85%	90%	86%
Steel.dev	72%	70%	99%	45%
Browserbase	65%	50%	94%	50%
Hyperbrowser	62%	60%	84%	41%
ZenRows	57%	55%	78%	36%
Anchor browser	50%	35%	43%	73%
Airtop	44%	40%	42%	50%

The overall score is the average of all other scores. Each component of our scoring system is explained below:

Success rate

Assessment of the benchmark results demonstrates distinctions in capabilities among leading providers:

Bright Data has achieved a 95% success rate.
BrowserAI and Steel.dev have a success rate of 85% and 70%, respectively.
Browserbase, Airtop, and Anchor Browser have lower success rates (50%, 40%, and 35%, respectively).

To understand how we calculated these success rates, please see our remote browser methodology.

Speed

Bright Data has a speed score of 100%
BrowserAI has the shortest browser startup time (average 1 sec).
Anchor Browser has the longest browsing time (average 160 sec).
Airtop has the slowest browser startup time (average 13.6 sec).

Speed score quantifies the throughput of the remote browser service, representing the number of successful tasks completed per defined unit of time. It reflects overall efficiency and processing capacity.

Browsing time for correct results (avg) measures the average time elapsed specifically during the remote browser’s active interaction with web pages for successfully completed, individual tasks. This includes time spent on page navigation, JavaScript rendering, and direct element interactions (e.g., clicks, typing).

This metric excludes any deliberate agent-side delays or processing times of external components like Large Language Models (LLMs).

Browser startup time (avg) measures the average time taken for the remote browser session to become ready, after the initial request to create or connect to a session is made.

Total time for correct results (avg) represents the average end-to-end duration for completed individual tasks.

This metric includes browser startup time, all active browsing/interaction times, any agent-side processing or deliberate delays, and communication latencies with external services (e.g., LLMs) that are part of the task’s execution flow.

To understand how these scores are calculated and what separates the top-performing browsers, please see our total time for correct results methodology.

Remote browser benchmark methodology

Our benchmark methodology is designed to evaluate the real-world performance of each remote browser for AI agents.

We used agents powered by a frontier LLM to execute a series of realistic, multi-step tasks that mimic common automation scenarios.

To ensure a fair and consistent benchmark, we focused on services that offer programmatic control via the Playwright automation library. This allowed us to use the same codebase to test all providers.

How we measured the success rate

The success rate measures the reliability of the browser infrastructure. A task was marked as “successful” only if the agent achieved its final, verifiable objective from start to finish. This score reflects the browser’s ability to handle complex websites, avoid blocks, and provide a stable environment for the agent.

We ran the following four primary tasks:

Task 1 – e-commerce (AI buyer):
- Scenario: An AI agent is given a budget and gift ideas. It crawls an e-commerce site to identify and purchase the best gift.
- Objective: Successfully search, navigate, fill forms, and reach the final purchase confirmation step.
Task 2 – lead generation (AI SDR):
- Scenario: An AI agent receives a company name. To find matching contacts, the agent performs a targeted Google search for publicly indexed profiles from sources like LinkedIn. It then crawls the search results page to extract potential leads’ names and profile URLs.
- Objective: Successfully identify at least one valid lead from the search results and navigate to their LinkedIn profile page to verify access.
Task 3 – travel planning (travel assistant):
- Scenario: An AI agent navigates to Booking.com to find hotels. It enters the destination (Miami, South Beach), selects the check-in and check-out dates (June 16-17, 2025), and performs a search. On the results page, the agent must identify and parse the listed hotels, filtering them to find properties within the specified price range ($100 – $200).
- Objective: Successfully extract and list at least two hotels that match all criteria (location, price, and date).
Task 4 – web forms (form filler):
- Scenario: An AI agent navigates to a corporate website (aimultiple.com) and must first handle any cookie consent pop-ups. It then locates the newsletter subscription form, enters a test email address (test@example.com), and clicks the ‘Subscribe’ button to complete the sign-up.
- Objective: Successfully submit the form and reach a confirmation state.

How we measured the total time for correct results

This metric measures the overall speed and efficiency of the service, but it’s calculated only for successful runs. This ensures that providers are judged on how quickly they can complete a task correctly without being penalized for the time spent on failed attempts.

The clock starts the moment a test is initiated and stops when the agent successfully completes its final objective. This end-to-end duration is a comprehensive figure that includes:

Browser Startup Time: The initial time required to connect to the remote browser and get a session ready for commands.
Page Navigation & Rendering: Time spent executing all page.goto() calls and waiting for pages to fully load and render, including complex JavaScript.
Agent “Thinking” Time: The latency from all calls made to the Large Language Model (LLM) for deciding the next action.
Tool Execution Time: The cumulative duration of every browser interaction, such as .click(), .fill(), and running custom scripts to extract data.

What leads to a better (faster) score?

A lower time on the chart indicates a more efficient browser infrastructure. Providers gain a better score by excelling in these areas:

Fast session initialization: Offering low-latency connections and rapid browser startup times, which minimizes the initial wait.
Efficient page rendering: Quickly processing JavaScript-heavy pages and dynamic content, allowing the agent to interact with elements sooner.
Stable and responsive infrastructure: Maintaining performance without hangs or crashes during multi-step tasks, ensuring that browser interactions (.click(), .fill()) execute without delay.

An example calculation

To make this clear, see how a hypothetical “Provider X” would be plotted on our chart after running 10 tasks:

Success rate calculation:
- Provider X succeeds on 7 tasks and fails on 3.
- Its Success Rate is 70%. This determines its position on the x-axis.
Average time calculation:
- The completion times for the 7 successful tasks are: 90s, 95s, 100s, 105s, 110s, 115s, and 120s.
- The times for the 3 failed tasks are completely ignored.
- The average time is calculated from the successful runs only:
  (90 + 95 + 100 + 105 + 110 + 115 + 120) / 7 = 105 seconds
- This 105s value determines its position on the y-axis.

Therefore, Provider X would be placed at the coordinates (70%, 105s) on the performance chart. This methodology ensures the chart accurately reflects both the reliability and the true speed of each service.

Features

Features provided by top providers are outlined below. The feature score is calculated for each capability following our methodology and then averaged over all features. Features that can take on multiple values (e.g. programming language support), the product that provides the highest number of values (e.g. the product that supports the highest number of programming languages) gets a full score of 1 while others get scored pro rata.

The following sections detail the capabilities of these services:

Technical capabilities & error handling

Technical capabilities allow developers the flexibility to work with various websites without building and maintaining their custom code modules:

Updated at 07-03-2025

Provider	Login	CAPTCHA solving	JS interaction	Error handling – 404	Error handling – 301
Bright Data	✅	✅	✅	❌	✅
Browserbase	❌	✅	✅	✅	❌
Hyperbrowser	❌	✅	✅	❌	❌
BrowserAI	✅	✅	✅	❌	✅
Airtop	❌	✅	✅	✅	❌
Steel.dev	✅	✅	✅	✅	✅
Anchor browser	❌	✅	❌	✅	✅
Zenrows	❌	❌	✅	✅	❌

CAPTCHA solving: This feature automatically detects and solves a wide range of CAPTCHA types, including image-based, hCaptcha, reCAPTCHA, and Cloudflare challenges. The service also handles rate-limited CAPTCHA prompts and adapts to evolving CAPTCHA mechanisms, ensuring consistent access to protected websites.

Error handling: This feature evaluates the default behavior of the service for standard HTTP status codes that are critical for reliable navigation:

404 (Not Found) Awareness: The system’s ability to detect and report ‘Not Found’ errors, enabling agents to handle missing pages appropriately. We tested by navigating to a non-existent URL and verifying if the agent receives a clear indication of the 404 error from the service, rather than a masked response (e.g., a generic error page served with a 200 OK status).
301/302 (Redirect) Management: Automatic following of redirects to ensure the agent arrives at the correct final URL. We tested by accessing a URL known to issue a redirect and confirming that the agent is navigated to the final destination URL without manual intervention.

JavaScript interaction: This feature handles JavaScript-heavy websites and supports emulating user interactions.

JavaScript Execution: Fully renders JavaScript to access dynamically loaded content.
Browser Action Automation: Supports programmatic interactions such as clicking elements, typing text into fields, scrolling pages (including infinite scroll), waiting for specific elements to appear or for a set duration, and handling pop-ups or modals.
Element Selection: Provides methods for selecting elements, including CSS selectors and XPath.

Login: This feature refers to the ability to enter usernames, passwords, and other credentials into login forms and simulate the submission of these forms (e.g., by clicking login buttons). This typically relies on the basic browser automation engine’s ability to interact with web elements.

Programming language

Programming language coverage allows developers to port their existing code to remote browser platforms.

Updated at 06-18-2025

Provider	Number of programming languages	Supported programming languages
Bright Data	3	Node.js, Python, C#
Browserbase	2	Node.js, Python
Hyperbrowser	2	Node.js, Python
BrowserAI	3	Node.js, Java, C#
Airtop	3	Node.js, Python, Go
Steel.dev	2	Node.js, Python
Anchor browser	5	Python, JavaScript, PHP, Go, Java
Zenrows	3	Node.js, Python, PHP

This feature evaluates the scope of programming language compatibility offered by the service. A higher number of supported languages signifies flexibility for development teams, allowing them to integrate the remote browser capabilities using their preferred or existing tech stack.

Session management

Session management is necessary for longer interactions involving multi-step interactions (e.g., purchasing a flight ticket) on the same website:

Updated at 06-10-2025

Provider	Session persistence	State preservation	Cookie handling
Bright Data	✅	✅	✅
Browserbase	✅	✅	✅
Hyperbrowser	✅	✅	❌
BrowserAI	✅	✅	✅
Airtop	✅	❌	✅
Steel.dev	✅	✅	✅
Anchor browser	✅	✅	✅
Zenrows	✅	✅	❌

This feature evaluates the service’s ability to manage and maintain state across multiple interactions within a browsing session.

Session Persistence: Support for maintaining a consistent session ID across multiple requests or actions, allowing for multi-step workflows.
Cookie Handling: Capabilities to automatically manage cookies (store, send, clear) or allow users to inject/manage custom cookies for maintaining logged-in states or specific site preferences.
State Preservation: The ability to preserve the browser’s state (e.g., filled forms, scrolled positions) across a sequence of actions within a single task.

Geo coverage

Geographic coverage includes both country-level coverage, so users can access global websites, as well as granular coverage like specific ASN or ZIP code-based targeting.

Updated at 06-10-2025

Provider	City-Level Targeting	ZIP-Code Targeting	ASN Targeting
Bright Data	✅	✅	✅
Browserbase	✅	❌	❌
Hyperbrowser	✅	❌	❌
BrowserAI	✅	✅	✅
Airtop	✅	❌	❌
Steel.dev	❌	❌	❌
Anchor browser	❌	❌	❌
Zenrows	❌	❌	❌

City-Level Targeting: The ability to specify a particular city as the origin for web requests. This allows for highly localized data retrieval and testing, reflecting what users in a specific urban area would see.

ZIP-Code / Postal Code Targeting: The capability to target requests based on specific ZIP codes or postal codes. This is especially relevant for e-commerce (checking local product availability, pricing, shipping options) and services with hyperlocal variations.

ASN (Autonomous System Number) Targeting: The option to route requests through specific Internet Service Providers (ISPs) or network blocks identified by their ASN. This advanced targeting can be useful for mimicking traffic from particular network segments or for very specific unblocking strategies.

Integrations

Integrations to browser automation libraries or protocols like MCP facilitate agent use:

Updated at 06-26-2025

Provider	Puppeteer	Playwright	Selenium	MCP
Bright Data	✅	✅	✅	✅
Browserbase	✅	✅	✅	✅
Hyperbrowser	✅	✅	✅	✅
BrowserAI	✅	✅	✅	❌
Airtop	❌	❌	❌	❌
Steel.dev	✅	✅	✅	✅
Anchor browser	❌	✅	❌	❌
Zenrows	✅	✅	❌	❌

Playwright Compatibility: Assesses the ability to connect to and control remote browser sessions using Playwright.

Puppeteer Compatibility: Evaluates integration with Puppeteer, often utilizing Puppeteer-core for connecting to remote browser instances.

Selenium Compatibility: Measures support for controlling remote browser sessions via Selenium WebDriver.

MCP (Model Context Protocol) Support: Indicates whether the service offers integration with the Model Context Protocol. MCP is designed to facilitate structured data exchange between tools (such as browsers) and AI models (LLMs), enabling AI agents to understand web content better and utilize it more effectively.

Search engines

Updated at 06-10-2025

Provider	Bing	Google	DuckDuckGo	Baidu
Bright Data	✅	✅	✅	✅
Browserbase	✅	✅	❌	❌
Hyperbrowser	✅	✅	❌	❌
BrowserAI	✅	✅	✅	✅
Airtop	✅	✅	❌	❌
Steel.dev	✅	✅	❌	❌
Anchor browser	✅	✅	✅	✅
Zenrows	✅	✅	❌	❌

This feature assesses whether the remote browser service offers specialized features or optimized support for extracting structured data directly from major search engine results pages (SERPs), such as Google, Bing, DuckDuckGo, and Baidu.

Security

Updated at 06-18-2025

Provider	ISO 27001	SOC2	ISO 27018 (PII)
Bright Data	✅	✅	✅
Browserbase	❌	✅	❌
Hyperbrowser	❌	❌	❌
BrowserAI	✅	✅	✅
Airtop	❌	✅	❌
Steel.dev	❌	❌	❌
Anchor browser	❌	❌	❌
Zenrows	❌	❌	❌

Data security is critical for agents, especially for those that will be carrying out actions on secure systems. We assessed whether the builders of these remote browsers had data security certifications based on their websites.

Remote browser requirements for AI agent types

The requirements for remote browsers vary depending on the type and intended use of the AI agent employing them. AI agents can be broadly categorized by their operational mode, which in turn dictates specific demands on the remote browser infrastructure:

Backend AI agents: These agents typically operate autonomously or with minimal direct human oversight, often triggered by system events or scheduled tasks. They require remote browsers optimized for stability, scalability, and robust error handling during prolonged operations.
Real-time AI agents: These agents interact directly with end-users who are actively waiting for a response. For these, remote browsers must prioritize low latency, high responsiveness, and consistent performance.

Backend agents

Typical use cases & agents:

Applicant tracking & management
AI SDR
Meeting scheduling
Price monitoring
Web automation

Real-time agents

Typical use cases & agents:

Research: OpenAI Deep research
Financial analyst

Additional requirements

Fast responses
Infrastructure stability for real-time use (i.e. response times should not degrade with parallel use).

Challenges & mitigations

Though we aim to run exactly the same test for all remote browsers, there are some challenges:

LLMs are probabilistic; therefore, our agents ask different agent browsers to go to different websites. Mitigations: We
- Leverage guardrails and a low-temperature setting to minimize variations.
- Have as specific queries as possible.
- We ran each agent multiple times (e.g., 5) to ensure that all tested solutions received similar requests.
This benchmark is designed to simulate individual tasks. The performance of these services may vary under the demands of large-scale automation processes.

Share This Article

Cem Dilmegani

Follow on

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Follow on

Researched by