While AI agents have the potential to increase enterprise automation levels, their real-world impact is so far limited. AI hallucination rates and a lack of reliable tools for agents hold them back.
Since most white-collar work takes place on the web, in this benchmark, we focus on remote browsers designed for AI agents, enabling them to search the web, navigate, log in, and interact with websites. While many remote browser solutions have high failure rates, leading brands achieve high success rates.
Top remote browsers benchmark results
Here are the top remote browsers based on their capabilities and performance during our benchmark:
Provider | Overall | Success rate for browser automation | Speed | Features | Security certifications |
---|---|---|---|---|---|
Bright Data | 99% | 100% | 100% | 95% | 100% |
BrowserAI | 64% | 90% | 80% | 86% | 0% |
Hyperbrowser | 45% | 70% | 69% | 41% | 0% |
Browserbase | 69% | 50% | 76% | 50% | 100% |
Steel.dev | 38% | 50% | 62% | 41% | 0% |
Airtop | 54% | 40% | 27% | 50% | 100% |
Zenrows | 32% | 40% | 59% | 27% | 0% |
Anchor browser | 31% | 30% | 20% | 73% | 0% |
The overall score is the average of all other scores. Each component of our scoring system is explained below:
Success rate
Assessment of the benchmark results demonstrates distinctions in capabilities among leading providers:
- Bright Data has achieved a 100% success rate.
- BrowserAI and Hyperbrowser have a success rate of 90% and 70%, respectively.
- Airtop, Zenrows, and Anchor Browser have lower success rates (40%, 40%, and 30%, respectively).
To understand how we calculated these results, please see our remote browser methodology.
Speed
- Bright Data has a speed score of 100%
- BrowserAI has the shortest browser startup time (average 1.1 sec).
- Anchor Browser has the longest browsing time (average 376 sec) and the lowest speed score (26%).
- Airtop has the slowest browser startup time (average 19.5 sec).
Speed score quantifies the throughput of the remote browser service, representing the number of successful tasks completed per defined unit of time. It reflects overall efficiency and processing capacity.
Browsing time for correct results (avg) measures the average time elapsed specifically during the remote browser’s active interaction with web pages for successfully completed, individual tasks. This includes time spent on page navigation, JavaScript rendering, and direct element interactions (e.g., clicks, typing).
- This metric excludes any deliberate agent-side delays or processing times of external components like Large Language Models (LLMs).
Browser startup time (avg) measures the average time taken for the remote browser session to become ready, after the initial request to create or connect to a session is made.
Total time for correct results (avg) represents the average end-to-end duration for completed individual tasks.
- This metric includes browser startup time, all active browsing/interaction times, any agent-side processing or deliberate delays, and communication latencies with external services (e.g., LLMs) that are part of the task’s execution flow.
Features
Features provided by top providers are outlined below. The feature score is calculated for each capability following our methodology and then averaged over all features. Features that can take on multiple values (e.g. programming language support), the product that provides the highest number of values (e.g. the product that supports the highest number of programming languages) gets a full score of 1 while others get scored pro rata.
The following sections detail the capabilities of these services:
Technical capabilities & error handling
Technical capabilities allow developers flexibility to work with various websites without building and maintaining their custom code modules:
Provider | Login | CAPTCHA solving | JS interaction | Error handling – 404 | Error handling – 301 |
---|---|---|---|---|---|
Bright Data | ✅ | ✅ | ✅ | ❌ | ✅ |
Browserbase | ❌ | ✅ | ✅ | ✅ | ❌ |
Hyperbrowser | ❌ | ✅ | ✅ | ❌ | ❌ |
BrowserAI | ✅ | ✅ | ✅ | ❌ | ✅ |
Airtop | ❌ | ✅ | ✅ | ✅ | ❌ |
Steel.dev | ❌ | ✅ | ✅ | ✅ | ❌ |
Anchor browser | ❌ | ✅ | ❌ | ✅ | ✅ |
Zenrows | ❌ | ❌ | ✅ | ✅ | ❌ |
CAPTCHA solving: This feature automatically detects and solves a wide range of CAPTCHA types, including image-based, hCaptcha, reCAPTCHA, and Cloudflare challenges. The service also handles rate-limited CAPTCHA prompts and adapts to evolving CAPTCHA mechanisms, ensuring consistent access to protected websites.
Error handling: This feature evaluates the default behavior of the service for standard HTTP status codes that are critical for reliable navigation:
- 404 (Not Found) Awareness: The system’s ability to detect and report ‘Not Found’ errors, enabling agents to handle missing pages appropriately. We tested by navigating to a non-existent URL and verifying if the agent receives a clear indication of the 404 error from the service, rather than a masked response (e.g., a generic error page served with a 200 OK status).
- 301/302 (Redirect) Management: Automatic following of redirects to ensure the agent arrives at the correct final URL. We tested by accessing a URL known to issue a redirect and confirming that the agent is navigated to the final destination URL without manual intervention.
JavaScript interaction: This feature handles JavaScript-heavy websites and supports emulating user interactions.
- JavaScript Execution: Fully renders JavaScript to access dynamically loaded content.
- Browser Action Automation: Supports programmatic interactions such as clicking elements, typing text into fields, scrolling pages (including infinite scroll), waiting for specific elements to appear or for a set duration, and handling pop-ups or modals.
- Element Selection: Provides methods for selecting elements, including CSS selectors and XPath.
Login: This feature refers to the ability to enter usernames, passwords, and other credentials into login forms and simulate the submission of these forms (e.g., by clicking login buttons). This typically relies on the basic browser automation engine’s ability to interact with web elements.
Programming language
Programming language coverage allows developers to port their existing code to remote browser platforms.
Provider | Number of programming languages | Supported programming languages |
---|---|---|
Bright Data | 3 | NodeJS, Python, C# |
Browserbase | 2 | Node.js, Python |
Hyperbrowser | 2 | Node.js, Python |
BrowserAI | 3 | Node.js, Java, C# |
Airtop | 3 | Python, Node.js, Go |
Steel.dev | 2 | Node.js, Python |
Anchor browser | 5 | Python, JavaScript, PHP, Go, Java |
Zenrows | 1 | Node.js |
This feature evaluates the scope of programming language compatibility offered by the service. A higher number of supported languages signifies flexibility for development teams, allowing them to integrate the remote browser capabilities using their preferred or existing tech stack.
Session management
Session management is necessary for longer interactions involving multi-step interactions (e.g., purchasing a flight ticket) on the same website:
Provider | Session persistence | State preservation | Cookie handling |
---|---|---|---|
Bright Data | ✅ | ✅ | ✅ |
Browserbase | ✅ | ✅ | ✅ |
Hyperbrowser | ✅ | ✅ | ❌ |
BrowserAI | ✅ | ✅ | ✅ |
Airtop | ✅ | ❌ | ✅ |
Steel.dev | ✅ | ✅ | ✅ |
Anchor browser | ✅ | ✅ | ✅ |
Zenrows | ✅ | ✅ | ❌ |
This feature evaluates the service’s ability to manage and maintain state across multiple interactions within a browsing session.
- Session Persistence: Support for maintaining a consistent session ID across multiple requests or actions, allowing for multi-step workflows.
- Cookie Handling: Capabilities to automatically manage cookies (store, send, clear) or allow users to inject/manage custom cookies for maintaining logged-in states or specific site preferences.
- State Preservation: The ability to preserve the browser’s state (e.g., filled forms, scrolled positions) across a sequence of actions within a single task.
Geo coverage
Geographic coverage includes both country-level coverage, so enterprises can access global websites, as well as granular coverage, which consists of the ability to use specific ASN or ZIP code-based targeting.
Provider | City-Level Targeting | ZIP-Code Targeting | ASN Targeting |
---|---|---|---|
Bright Data | ✅ | ✅ | ✅ |
Browserbase | ✅ | ❌ | ❌ |
Hyperbrowser | ✅ | ❌ | ❌ |
BrowserAI | ✅ | ✅ | ✅ |
Airtop | ✅ | ❌ | ❌ |
Steel.dev | ❌ | ❌ | ❌ |
Anchor browser | ❌ | ❌ | ❌ |
Zenrows | ❌ | ❌ | ❌ |
City-Level Targeting: The ability to specify a particular city as the origin for web requests. This allows for highly localized data retrieval and testing, reflecting what users in a specific urban area would see.
ZIP-Code / Postal Code Targeting: The capability to target requests based on specific ZIP codes or postal codes. This is especially relevant for e-commerce (checking local product availability, pricing, shipping options) and services with hyperlocal variations.
ASN (Autonomous System Number) Targeting: The option to route requests through specific Internet Service Providers (ISPs) or network blocks identified by their ASN. This advanced targeting can be useful for mimicking traffic from particular network segments or for very specific unblocking strategies.
Integrations
Integrations to browser automation libraries or protocols like MCP facilitate agent use:
Provider | Puppeteer | Playwright | Selenium | MCP |
---|---|---|---|---|
Bright Data | ✅ | ✅ | ✅ | ✅ |
Browserbase | ✅ | ✅ | ✅ | ✅ |
Hyperbrowser | ✅ | ✅ | ✅ | ✅ |
BrowserAI | ✅ | ✅ | ✅ | ❌ |
Airtop | ❌ | ❌ | ❌ | ❌ |
Steel.dev | ✅ | ✅ | ✅ | ❌ |
Anchor browser | ❌ | ✅ | ❌ | ❌ |
Zenrows | ✅ | ✅ | ❌ | ❌ |
Playwright Compatibility: Assesses the ability to connect to and control remote browser sessions using Playwright.
Puppeteer Compatibility: Evaluates integration with Puppeteer, often utilizing puppeteer-core for connecting to remote browser instances.
Selenium Compatibility: Measures support for controlling remote browser sessions via Selenium WebDriver.
MCP (Model Context Protocol) Support: Indicates whether the service offers integration with the Model Context Protocol. MCP is designed to facilitate structured data exchange between tools (such as browsers) and AI models (LLMs), enabling AI agents to understand web content better and utilize it more effectively.
Search engines
Provider | Bing | DuckDuckGo | Baidu | |
---|---|---|---|---|
Bright Data | ✅ | ✅ | ✅ | ✅ |
Browserbase | ✅ | ✅ | ❌ | ❌ |
Hyperbrowser | ✅ | ✅ | ❌ | ❌ |
BrowserAI | ✅ | ✅ | ✅ | ✅ |
Airtop | ✅ | ✅ | ❌ | ❌ |
Steel.dev | ✅ | ✅ | ❌ | ❌ |
Anchor browser | ✅ | ✅ | ✅ | ✅ |
Zenrows | ✅ | ✅ | ❌ | ❌ |
This feature assesses whether the remote browser service offers specialized features or optimized support for extracting structured data directly from major search engine results pages (SERPs), such as Google, Bing, DuckDuckGo, and Baidu.
Security
Provider | ISO 27001 | SOC2 | ISO 27018 (PII) |
---|---|---|---|
Bright Data | ✅ | ✅ | ✅ |
Browserbase | ❌ | ✅ | ❌ |
Hyperbrowser | ❌ | ❌ | ❌ |
BrowserAI | ❌ | ❌ | ❌ |
Airtop | ❌ | ✅ | ❌ |
Steel.dev | ❌ | ❌ | ❌ |
Anchor browser | ❌ | ❌ | ❌ |
Zenrows | ❌ | ❌ | ❌ |
Data security is critical for agents, especially for those that will be carrying out actions on enterprise systems. We assessed whether builders of these agent browsers had data security certification based on their websites. If they had no certificates from these 3, we assigned them a security score of 0.
Participants
We are analyzing products that provide cloud-based remote browsers suitable for AI agent automation. For this benchmark, we focused on services offering programmatic control via the Playwright automation library, as
- This enables direct and flexible web interaction for AI agents
- Allows us to test all agents using the same code base
Benchmarked products:
- Bright Data
- anchorbrowser.io
- airtop.ai
- BrowserAI
- browserbase.com
- hyperbrowser.ai
- Steel.dev
- Zenrows
Remote browser requirements for AI agent types
The requirements for remote browsers vary depending on the type and intended use of the AI agent employing them. AI agents can be broadly categorized by their operational mode, which in turn dictates specific demands on the remote browser infrastructure:
- Backend AI Agents: These agents typically operate autonomously or with minimal direct human oversight, often triggered by system events or scheduled tasks. They require remote browsers optimized for stability, scalability, and robust error handling during prolonged operations.
- Real-time AI Agents: These agents interact directly with end-users who are actively waiting for a response. For these, remote browsers must prioritize low latency, high responsiveness, and consistent performance.
Backend agents
Typical use cases & agents:
- Applicant tracking & management
- AI SDR
- Meeting scheduling
- Price monitoring
- Web automation
Real-time agents
Typical use cases & agents:
- Research: OpenAI Deep research
- Financial analyst
Additional requirements
- Fast responses
- Infrastructure stability for real-time use (i.e. response times should not degrade with parallel use).
Remote browser benchmark methodology
In the benchmark, we:
- Used an agent that leverages a frontier LLM and the agent browsers.
- We will collect metrics for all web requests sent from these agents. If the choice of agent browser has an impact on the result (e.g. due to issues in the retrieved data), we will denote it.
A successful request includes:
- HTTP response code: 200
- Returned content is of reasonable size. The size threshold is determined by comparing the content sizes of different responses from all providers. If a response is smaller than half the average response size for that specific URL, it is classified as incorrect or missing.
- A specific CSS selector for that type of page. If we have not identified a CSS selector for a specific page, we’ll leverage an LLM to identify partial or empty responses.
Finally, we validated our approach by cross-checking actual responses to ensure that the answers assumed to be incorrect are actually incorrect, ensuring a high true negative rate.
Task -1: backend agent AI buyer
Scenario: Sales team explains potential gift ideas to customers for specific days (e.g. birthday) to the AI buyer. Sales team also outlines the budget to the AI buyer. AI buyer then crawls Temu and other e-commerce websites that allow harmless bot activity to identify best gifts. Agent will then buy these gifts.
Test cases: 3 gifts.
Necessary steps: Website search, navigation, filling forms/fields, and purchase
Task -2: backend agent AI SDR for lead generation
Scenario: Receives a list of companies and a description of the ideal lead from the marketing team. AI SDR then crawls online sources like LinkedIn to identify the right leads and searches them on the web to create personalized outbound messages for them and sends these messages to them on social media or email.
Test cases: 10 companies. Each company could yield a few leads
Necessary steps: Navigation, filling forms/fields, web search
Necessary capabilities: Navigation, filling forms/fields
Challenges & mitigations
Though we aim to run exactly the same test for all remote browsers, there are some challenges:
- LLMs are probabilistic, therefore our agents ask different agent browsers to go to different websites. Mitigations: We
- Leverage guardrails and a low temperature setting to minimize variations.
- Have as specific queries as possible.
- We ran each agent multiple times (e.g., 5) to ensure that all tested solutions received similar requests.
Why are remote browsers important?
If an AI agent must perform human-like actions online, it needs a remote browser for effective interaction. Typical actions, such as product searches, form filling, and site navigation, require a robust browsing infrastructure that mimics human behavior. Without such an infrastructure, browsing activities get blocked by anti-scraping measures. Thus, the quality of remote browsers has a significant impact on the success rates and performance of AI agents.
Comments
Your email address will not be published. All fields are required.