Why scrape customer reviews instead of collecting them manually?

Manual product review scraping is slow and incomplete. Scraping customer reviews using automated tools allows you to extract hundreds or thousands of reviews in minutes. This saves time and ensures your data collection process captures both positive and negative reviews.

How can businesses use scraped review data?

Scraped reviews provide valuable customer insights for market research. Companies can track customer concerns, measure customer loyalty, and analyze customer preferences over time.

What are the risks of scraping review platforms?

Most review platforms set restrictions on automated data extraction. Running web scrapers too aggressively can trigger CAPTCHA, IP blocks, or bans. To reduce risks, use a respectful automated process with rate limits, random delays, and residential proxies if needed.

What kind of data can be extracted from reviews?

Typical fields include review text, star ratings, user names, dates, and metadata. Some setups also track structured data like location, product category, or business type.

How many websites can I scrape reviews from?

You can collect customer reviews from various websites, including e-commerce platforms, social media networks, and popular platforms like Amazon, Walmart, Yelp, Google Play, and Trustpilot.

Data Web Data Scraping E-commerce Scraping

How to Scrape Reviews for Free Using Python (No APIs)

Gulbahar Karatas

updated on Oct 3, 2025

See our ethical norms

This article shows how I scraped reviews from Google Play, Yelp, and Trustpilot using Python. Each platform required a different method:

Trustpilot company pages: requests + BeautifulSoup with multiple selectors for changing HTML.
Yelp businesses: Selenium with an anti-detection setup to bypass strong bot protection.
Google Play Store apps: Selenium, combined with the Google Play Scraper library, for fast and structured results.

Learn how to handle anti-detection, parse customer review data, and save the scraped data in CSV, JSON, or export to a database like Google Sheets for data analysis.

How to Scrape Google Play Reviews with Python & Selenium

Step 1: Setting up anti-detection

Scraping Google Play reviews requires hiding automation signals. The script configures Chrome with disabled automation flags, a custom user agent, and a fixed window size.

Start with headless=False to monitor the browser, then switch to True once stable.

Part 2: Searching for apps

This function searches the Play Store for apps matching your search term and extracts app names and URLs.

The function builds a search URL with &c=apps to filter apps. We look for links containing /store/apps/details?id=, which is Google Play’s URL pattern. The app ID comes from the URL, and the app name is pulled from a parent span tag.

If no name is found, we fall back to the app ID. The max_apps parameter controls the number of apps to scrape (default: 3, but adjustable to 5, 10, or more). The search_term defines the category, e.g., “thrift shopping,” “fitness tracking,” or “photo editing.”

Step 3: Extracting reviews

This function navigates to an app page, clicks “See all reviews”, scrolls to load more reviews, and extracts customer review data.

The function clicks “See all reviews” if available, then waits for review elements. If the button is missing, it falls back to scrolling. Reviews are collected until the limit is reached or five scrolls return no new data.

The scraper extracts review ID, username, rating, date, and full review text, expanding truncated reviews when possible.

CSS selectors:

.RHo1pe → review container
.X5PpBb → username
.iXRFPc → rating (aria-label)
.bp9Aid → date
.h3YV2d → review text

The max_reviews parameter controls the number of reviews (default 20, adjustable to 50, 100, or more).

Step 4: Putting it all together

These functions combine the app search and customer review scraping workflow, then save results to CSV.

Example run

The workflow searches for apps, scrapes reviews, and saves them to a CSV. Each review is tagged with the app name, ID, and URL. A three-second delay between apps prevents rate limiting.

With the default settings (3 apps × 10 reviews), the script collects about 30 reviews in 3–4 minutes. The CSV includes app name, ID, URL, username, rating, date, review text, and review ID.

How to scrape Yelp reviews with Python (No API)

Step 1: Setting up anti-Detection

When we first attempted to scrape Yelp using basic Selenium, we immediately encountered a CAPTCHA. Yelp detects automation signals, so basic setups fail.

Through testing, we found Yelp checks specific browser properties. For example, the navigator.webdriver property in JavaScript returns true when Selenium is active. Chrome’s automation flags and the user-agent string can also reveal automation.

Here’s the setup that worked:

The key breakthrough was redefining the navigator.webdriver property. By overriding it to return undefined, Selenium is no longer detected. Combined with a custom user-agent string, this makes the browser appear more like a real user.

Start with headless=False to monitor the run. Once stable, switch to True for faster background scraping.

Step 2: Searching and finding businesses

Yelp’s search results load progressively, and the HTML uses dynamic class names that change often. This breaks selectors based on exact classes.

Our first attempts captured irrelevant links like “See more reviews” and “View menu”. Filtering was added to target only real business links.

We scroll multiple times since Yelp loads results progressively. The scraper collects links containing /biz/, Yelp’s URL pattern for business pages. Filtering ensures only valid businesses are included, while duplicates and names shorter than three characters are skipped.

The max_businesses parameter controls how many results are scraped. Start with 3 for testing, then increase once stable.

Step 3: Extracting reviews

Yelp’s biggest challenge is unstable HTML. Standard CSS selectors often failed or returned the wrong elements.

By inspecting with Chrome DevTools, we found reviews inside paragraph tags with class names containing “comment”. Within them, span tags containing “raw” hold the actual review text. This pattern stays consistent even when class names change slightly.

This approach uses partial matching with [class*=”comment”] and [class*=”raw”]. This makes the web scraping tool more resilient, since Yelp frequently changes exact class names. The script extracts the review text, user name, rating, and date.

Step 4: Putting it all together

Now we combine everything into a workflow that searches for businesses, scrapes product reviews, and saves the results to CSV.

Example run

A two-second delay between businesses prevents rate limiting. In tests, two seconds was reliable, but you can reduce it to one second for small runs or increase it to 5 seconds for large-scale scraping.

Each review is tagged with the business name and URL before being saved, allowing you to trace the source.

With the default settings of 3 businesses and 10 reviews each, the script collects ~30 reviews in 2–3 minutes. Once stable, you can scale up:

10 businesses × 20 reviews each: ~200 reviews in ~10 minutes
20 businesses × 50 reviews each: ~1000 reviews in ~15–20 minutes

The CSV file includes columns for business name, URL, username, rating, date, and review text. It can be opened in Excel or imported into pandas for analyzing customer feedback.

How to Scrape Trustpilot Reviews with Python

Step 1: Setting up and searching for companies

Required libraries

We import the necessary libraries:

Requests: handles HTTP requests
BeautifulSoup: parses the HTML we receive
JSON: saves data in a structured format
time: adds delays to avoid overwhelming the server
quote from urllib.parse: encodes search terms for URLs

The search function

The search term is URL-encoded with quote(). If a location is provided, it’s added as a parameter. Custom headers mimic a real browser to reduce blocking, with a User-Agent string that identifies us as Chrome on Windows.

Making the request and parsing results

We send the request with headers and parse results using BeautifulSoup. Because Trustpilot often changes class names, multiple selectors are defined.

Each selector targets links with /review/, which mark company pages. From each link, we extract the slug (unique identifier in the URL), clean it into a readable name, and return the first three companies found.

Step 2: Fetching review data from company pages

Review the fetching function

The function loops through pages until the desired number of reviews is collected. Each page is requested with headers to avoid detection, and pagination is handled by incrementing the page parameter.

Extracting review cards

We try multiple selectors because Trustpilot frequently changes its design. Each selector targets possible review card structures. If none match, we stop scraping.

Parsing individual reviews

For each review, we extract the rating, title, review text, date, and username. Flexible selectors (with lambda) make the scraper resilient to HTML changes.

After processing each page, we add a 2-second delay using time.sleep(2). This is crucial for being respectful to Trustpilot’s servers and avoiding rate limiting or IP bans.

Step 3: Main program and output

Main function setup

This main function defines the search term, location, and review limit. The location can be set to any country (e.g., “Germany”) or None for global results. The fallback ensures functionality even if the search fails.

Collecting and saving data

Each company’s reviews are stored in a dictionary with metadata (URL, review count). A 2-second delay is added between companies to respect Trustpilot’s servers. Finally, the results are saved to a JSON file with UTF-8 encoding.

Displaying results

The script prints a clean summary of all reviews. Each review displays the user, rating, title, and text. The .get() method ensures missing fields default to ‘N/A‘. Finally, the script confirms the total reviews scraped and the JSON filename.

Final thoughts

Scraping reviews from Google Play, Yelp, and Trustpilot required different Python approaches. Each scraper exported ~30 reviews per run in CSV/JSON with usernames, ratings, dates, and text.

The benchmarks below show the key differences:

FAQs about review scraping

Industry Analyst

Gulbahar Karatas

Industry Analyst

Follow On

Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.

View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

How to Scrape Google Play Reviews with Python & Selenium

How to scrape Yelp reviews with Python (No API)

How to Scrape Trustpilot Reviews with Python

Final thoughts

FAQs about review scraping

We follow ethical norms & our process for objectivity. AIMultiple's customers in E-commerce Scraping include Bright Data, Oxylabs, Decodo, Apify, Zyte.

Next to Read

GenAI ApplicationsMay 28

How to Scrape Reviews for Free Using Python (No APIs)

How to Scrape Google Play Reviews with Python & Selenium

Step 1: Setting up anti-detection

Part 2: Searching for apps

Step 3: Extracting reviews

Step 4: Putting it all together

Example run

How to scrape Yelp reviews with Python (No API)

Step 1: Setting up anti-Detection

Step 2: Searching and finding businesses

Step 3: Extracting reviews

Step 4: Putting it all together

Example run

How to Scrape Trustpilot Reviews with Python

Step 1: Setting up and searching for companies

The search function

Making the request and parsing results

Step 2: Fetching review data from company pages

Review the fetching function

Extracting review cards

Parsing individual reviews

Step 3: Main program and output

Main function setup

Collecting and saving data

Displaying results

Final thoughts

FAQs about review scraping

Be the first to comment

Next to Read

Microsoft Copilot: Review & Top 4 Alternatives

DLP Review: Benchmark Testing of 6 DLP Products

Dataforce by TransPerfect Review & Alternatives

Geonode Proxies: Features, Pricing and Reviews

GeoSurf Proxy Server: A Comprehensive Review