AIMultipleAIMultiple
No results found.

How to Scrape Reviews for Free Using Python (No APIs)

Gulbahar Karatas
Gulbahar Karatas
updated on Oct 3, 2025

This article shows how I scraped reviews from Google Play, Yelp, and Trustpilot using Python. Each platform required a different method:

  • Trustpilot company pages: requests + BeautifulSoup with multiple selectors for changing HTML.
  • Yelp businesses: Selenium with an anti-detection setup to bypass strong bot protection.
  • Google Play Store apps: Selenium, combined with the Google Play Scraper library, for fast and structured results.

Learn how to handle anti-detection, parse customer review data, and save the scraped data in CSV, JSON, or export to a database like Google Sheets for data analysis.

How to Scrape Google Play Reviews with Python & Selenium

Step 1: Setting up anti-detection

Scraping Google Play reviews requires hiding automation signals. The script configures Chrome with disabled automation flags, a custom user agent, and a fixed window size.

Start with headless=False to monitor the browser, then switch to True once stable.

Part 2: Searching for apps

This function searches the Play Store for apps matching your search term and extracts app names and URLs.

The function builds a search URL with &c=apps to filter apps. We look for links containing /store/apps/details?id=, which is Google Play’s URL pattern. The app ID comes from the URL, and the app name is pulled from a parent span tag.

If no name is found, we fall back to the app ID. The max_apps parameter controls the number of apps to scrape (default: 3, but adjustable to 5, 10, or more). The search_term defines the category, e.g., “thrift shopping,” “fitness tracking,” or “photo editing.”

Step 3: Extracting reviews

This function navigates to an app page, clicks “See all reviews”, scrolls to load more reviews, and extracts customer review data.

The function clicks “See all reviews” if available, then waits for review elements. If the button is missing, it falls back to scrolling. Reviews are collected until the limit is reached or five scrolls return no new data.

The scraper extracts review ID, username, rating, date, and full review text, expanding truncated reviews when possible.

CSS selectors:

  • .RHo1pe → review container
  • .X5PpBb → username
  • .iXRFPc → rating (aria-label)
  • .bp9Aid → date
  • .h3YV2d → review text

The max_reviews parameter controls the number of reviews (default 20, adjustable to 50, 100, or more).

Step 4: Putting it all together

These functions combine the app search and customer review scraping workflow, then save results to CSV.

Example run

The workflow searches for apps, scrapes reviews, and saves them to a CSV. Each review is tagged with the app name, ID, and URL. A three-second delay between apps prevents rate limiting.

With the default settings (3 apps × 10 reviews), the script collects about 30 reviews in 3–4 minutes. The CSV includes app name, ID, URL, username, rating, date, review text, and review ID.

How to scrape Yelp reviews with Python (No API)

Step 1: Setting up anti-Detection

When we first attempted to scrape Yelp using basic Selenium, we immediately encountered a CAPTCHA. Yelp detects automation signals, so basic setups fail.

Through testing, we found Yelp checks specific browser properties. For example, the navigator.webdriver property in JavaScript returns true when Selenium is active. Chrome’s automation flags and the user-agent string can also reveal automation.

Here’s the setup that worked:

The key breakthrough was redefining the navigator.webdriver property. By overriding it to return undefined, Selenium is no longer detected. Combined with a custom user-agent string, this makes the browser appear more like a real user.

Start with headless=False to monitor the run. Once stable, switch to True for faster background scraping.

Step 2: Searching and finding businesses

Yelp’s search results load progressively, and the HTML uses dynamic class names that change often. This breaks selectors based on exact classes.

Our first attempts captured irrelevant links like “See more reviews” and “View menu”. Filtering was added to target only real business links.

We scroll multiple times since Yelp loads results progressively. The scraper collects links containing /biz/, Yelp’s URL pattern for business pages. Filtering ensures only valid businesses are included, while duplicates and names shorter than three characters are skipped.

The max_businesses parameter controls how many results are scraped. Start with 3 for testing, then increase once stable.

Step 3: Extracting reviews

Yelp’s biggest challenge is unstable HTML. Standard CSS selectors often failed or returned the wrong elements.

By inspecting with Chrome DevTools, we found reviews inside paragraph tags with class names containing “comment”. Within them, span tags containing “raw” hold the actual review text. This pattern stays consistent even when class names change slightly.

This approach uses partial matching with [class*=”comment”] and [class*=”raw”]. This makes the web scraping tool more resilient, since Yelp frequently changes exact class names. The script extracts the review text, user name, rating, and date.

Step 4: Putting it all together

Now we combine everything into a workflow that searches for businesses, scrapes product reviews, and saves the results to CSV.

Example run

A two-second delay between businesses prevents rate limiting. In tests, two seconds was reliable, but you can reduce it to one second for small runs or increase it to 5 seconds for large-scale scraping.

Each review is tagged with the business name and URL before being saved, allowing you to trace the source.

With the default settings of 3 businesses and 10 reviews each, the script collects ~30 reviews in 2–3 minutes. Once stable, you can scale up:

  • 10 businesses × 20 reviews each: ~200 reviews in ~10 minutes
  • 20 businesses × 50 reviews each: ~1000 reviews in ~15–20 minutes

The CSV file includes columns for business name, URL, username, rating, date, and review text. It can be opened in Excel or imported into pandas for analyzing customer feedback.

How to Scrape Trustpilot Reviews with Python

Step 1: Setting up and searching for companies

Required libraries

We import the necessary libraries:

  • Requests: handles HTTP requests
  • BeautifulSoup: parses the HTML we receive
  • JSON: saves data in a structured format
  • time: adds delays to avoid overwhelming the server
  • quote from urllib.parse: encodes search terms for URLs

The search function

The search term is URL-encoded with quote(). If a location is provided, it’s added as a parameter. Custom headers mimic a real browser to reduce blocking, with a User-Agent string that identifies us as Chrome on Windows.

Making the request and parsing results

We send the request with headers and parse results using BeautifulSoup. Because Trustpilot often changes class names, multiple selectors are defined.

Each selector targets links with /review/, which mark company pages. From each link, we extract the slug (unique identifier in the URL), clean it into a readable name, and return the first three companies found.

Step 2: Fetching review data from company pages

Review the fetching function

The function loops through pages until the desired number of reviews is collected. Each page is requested with headers to avoid detection, and pagination is handled by incrementing the page parameter.

Extracting review cards

We try multiple selectors because Trustpilot frequently changes its design. Each selector targets possible review card structures. If none match, we stop scraping.

Parsing individual reviews

For each review, we extract the rating, title, review text, date, and username. Flexible selectors (with lambda) make the scraper resilient to HTML changes.

After processing each page, we add a 2-second delay using time.sleep(2). This is crucial for being respectful to Trustpilot’s servers and avoiding rate limiting or IP bans.

Step 3: Main program and output

Main function setup

This main function defines the search term, location, and review limit. The location can be set to any country (e.g., “Germany”) or None for global results. The fallback ensures functionality even if the search fails.

Collecting and saving data

Each company’s reviews are stored in a dictionary with metadata (URL, review count). A 2-second delay is added between companies to respect Trustpilot’s servers. Finally, the results are saved to a JSON file with UTF-8 encoding.

Displaying results

The script prints a clean summary of all reviews. Each review displays the user, rating, title, and text. The .get() method ensures missing fields default to ‘N/A‘. Finally, the script confirms the total reviews scraped and the JSON filename.

Final thoughts

Scraping reviews from Google Play, Yelp, and Trustpilot required different Python approaches. Each scraper exported ~30 reviews per run in CSV/JSON with usernames, ratings, dates, and text.

The benchmarks below show the key differences:

FAQs about review scraping

Industry Analyst
Gulbahar Karatas
Gulbahar Karatas
Industry Analyst
Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.
View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

0/450