AIMultipleAIMultiple
No results found.

How to Bypass CAPTCHA in Web Scraping

Gulbahar Karatas
Gulbahar Karatas
updated on Sep 16, 2025

CAPTCHA is not static. They constantly evolve to counter bypass techniques. This necessitates continuous adaptation and development for scrapers, making it essential to find reliable and high-quality solutions.

This article discusses the top methods and tools for CAPTCHA bypass, alongside an overview of the various CAPTCHA types available today.

What does CAPTCHA mean?

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is an automated challenge-response test used on computing systems to ensure that the user is human and not a bot.

As AI and automation techniques improve, bots become better at solving existing CAPTCHA types. To stay ahead, CAPTCHA developers introduce new challenges and more behavioral analysis.

Three approaches to bypassing CAPTCHAs

When faced with a CAPTCHA, there are generally three main strategies:

1. Using a stealth browser (mimicking human behavior)

This approach focuses on making an automated browser appear as human as possible. It involves:

  • Headless browser detection evasion: Stealth libraries like playwright-extra with its stealth plugin or selenium-stealth modify these browser properties before the website’s JavaScript can inspect them. For example, the target website might check if navigator. webdriver to determine if a bot is present. Stealth techniques modify the navigator. webdriver to be false or non-existent.
  • Randomized interactions: Bots, by default, often perform actions with machine-like precision and speed. For instance, instead of time.sleep(1) before every action, you can introduce time.sleep(random.uniform(1.5, 3.0)). Instead of directly clicking a target element, you might program the mouse cursor to move along a slightly curved, non-linear path from its current position.
  • Cookie and session management: Websites use cookies to maintain state, track user preferences, and identify returning visitors. You can ensure that your scraping framework saves and reloads cookies for subsequent requests to allow the website to recognize the user as a returning visitor.
  • Proxy rotation: A single IP address making thousands of requests in a short period is a bot signature. This method acquires a pool of proxy servers, such as residential, and implements logic to rotate these proxies frequently. This distributes your request load across many IP addresses
  • Browser fingerprint spoofing: A browser fingerprint is a unique identifier generated from a combination of your browser’s configuration and system settings. Stealth libraries actively modify or randomize these elements. For example, they might inject JavaScript to override the CanvasRenderingContext2D.prototype.toDataURL method, adding noise to the output to ensure the canvas fingerprint is not unique or matches a common one.

2. Using Artificial Intelligence (AI) for image recognition

AI, particularly deep learning models, can be trained to solve image-based CAPTCHA. This involves:

To teach a model to read CAPTCHA images, you need thousands, or even tens of thousands, of CAPTCHA images paired with their correct solutions. This is often the most labor-intensive part.

You might scrape CAPTCHAs, send them to a cheap human solver service once, and then use those solutions to build your dataset. Howover, if Amazon updates its CAPTCHA design, your old dataset might become obsolete.

3. Using CAPTCHA solving services (human or hybrid)

Using a CAPTCHA-solving service is often the most reliable method. These services act as an intermediary:

  • Human solvers: The CAPTCHA image is sent to a pool of human workers who solve it in real-time. Services like 2Captcha, Anti-Captcha, or DeathByCaptcha fall into this category. For example, when your scraper captures the CAPTCHA image, it sends this information via an API call to the CAPTCHA solving service. The human worker solves the CAPTCHA and submits the solution back to the service. The service then returns the solution to your scraper via its API.
  • Hybrid solvers: They use AI models to solve simpler, well-understood CAPTCHAs and fall back to human solvers for more complex, new, or difficult CAPTCHAs. The Captcha solver internally routes the CAPTCHA to either an AI engine or a human worker based on its complexity.

Why are CAPTCHAs a challenge for web scraping?

CAPTCHAs are the digital bouncers of the internet that differentiates between human users and automated programs (bots). Web scrapers click in the same spots, navigate pages in the same order, and send requests from a limited set of IP addresses.

For example, many scrapers use headless browsers (e.g., Chrome/Firefox in headless mode, Puppeteer, Playwright). These browsers, by default, have specific JavaScript properties that websites can detect, such as missing plugins.

CAPTCHA is a challenge for web scraping because it forces automated systems to mimic organic, unpredictable human behavior and cognitive abilities. The process of encountering, solving, and submitting a CAPTCHA introduces delays into the scraping workflow.

What are the common types of CAPTCHAs?

There are six different types of CAPTCHAs, each designed to offer a unique level of protection against bots and automated programs. The following are some of the most common CAPTCHA types:

1. Image-based CAPTCHAs

Image-based CAPTCHA displays a distorted image of a word or sequence of characters that the user must recognize and enter into a text field (Figure 1).

The image distortion is intended to make it more difficult for automated programs to identify the characters while remaining solvable by a real person. Image-based CAPTCHAs are effective at preventing bots from accessing websites, despite being more difficult and time-consuming for users to solve.

However, some machine learning algorithms, such as CNNs and SVMs, can accurately solve a variety of image-based CAPTCHAs. These algorithms function by analyzing many CAPTCHA large image datasets and training a model to recognize the patterns of the characters within the image.

As a result, many websites have adopted more advanced CAPTCHA challenges, such as interactive CAPTCHA and “No CAPTCHA”. These CAPTCHAs  use different challenges to differentiate between real people and bots.

Figure 1: An example of an image-based CAPTCHA solution

Image-based CAPTCHAs challenge users by showing a distorted image of a word or sequence of characters.

2. Audio-based CAPTCHAs

Audio-based CAPTCHA presents a distorted audio clip of a word or series of characters (Figure 2). The user must listen to the audio clip and correctly identify the word or characters given in the clip. This type of CAPTCHA is often used for users with visual impairments.

Figure 2: An example of audio-based CAPTCHA

3. Text-based CAPTCHAs

Text CAPTCHA is displayed in odd and distorted formatting. The user must correctly identify and enter into a text field to pass the test.

4. Math-based CAPTCHAs

Math-based CAPTCHA presents the user with a simple math problem to solve and enter into a text field, such as “What is 3 + 2?”.

Figure 3:  Example of a math-based CAPTCHA

Math-based CAPTCHA test the user with a simple math problem to solve.

5. Interactive CAPTCHAs

Interactive CAPTCHA presents a series of puzzles or games the user must complete to prove they are human beings.

6. Checkbox-based CAPTCHAs

Checkbox-based CAPTCHA is a type of reCAPTCHA. reCAPTCHA is a free service developed by Google to help websites protect their websites from unwanted and malicious activities.

Checkbox reCAPTCHA requires users to check a box to confirm they are not robots. It may present additional challenges, such as selecting all images that match specific criteria or performing a simple math problem.

Figure 4: Process flow diagram of Google reCAPTCHA

FAQs about bypassing CAPTCHA

Industry Analyst
Gulbahar Karatas
Gulbahar Karatas
Industry Analyst
Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.
View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

0/450