AIMultipleAIMultiple
No results found.

Python Yellow Page Scraper: How to scrape Yellow Pages

Cem Dilmegani
Cem Dilmegani
updated on Sep 17, 2025

Yellow pages provide easy access to a variety of services/businesses, which may not all show up in your Google search. Search engines report results based on relevance to the search term, whereas online yellow pages show results based on geographic areas.  If you are looking to get a better local reach for your business, it can be helpful to advertise in online yellow pages for your region.

This article will guide you through the process of building your own web scraper to collect data from Yellow Pages directories.

How to scrape data from Yellow Pages with Python

Step 1: Install required libraries

Before we begin, you need to install the necessary Python libraries. Open your terminal or command prompt and run the following command:

pip install requests beautifulsoup4 pandas

This command installs requests for making HTTP requests, BeautifulSoup for parsing HTML, and pandas for data manipulation and export.

Step 2: Define the target URL

The first step in any scraping project is to identify the target URL. For YellowPages.az, the URLs are quite straightforward.

For example, searching for “HOTEL” and sorting the results by ratings would use this URL:

https://yellowpages.az/results/?search-title=OTEL&sort-by=ratings

Here:

  • search-title=OTEL defines the keyword we are searching for.
  • sort-by=ratings sorts the results based on their ratings.

Step 3: Fetch HTML content

Once you have your target URL, the next step is to download its HTML content. The requests library in Python is perfect for this. To ensure our request mimics a real browser and avoids potential blocking, we’ll also include a User-Agent header.

import requests
from bs4 import BeautifulSoup

url = "https://yellowpages.az/results/?search-title=OTEL&sort-by=ratings"
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
  • r.text contains the full HTML source code of the fetched page.
  • BeautifulSoup(r.text, “html.parser”) transforms this raw HTML into a structured, navigable object. This allows us to easily search for and extract specific elements like business names, addresses, or phone numbers.

Step 4: Extract business details

4.1 Locating data with the inspect tool

Before writing any extraction code, we need to understand how the data is organized within the website’s HTML. This crucial step is performed using your browser’s “Inspect” tool (e.g., Chrome DevTools, Firefox Developer Tools).

The image demonstrates how to locate the data on Yellow Pages and extract the required information.

4.2 Extracting the Data with BeautifulSoup

With the knowledge of where each data point resides, we can now programmatically extract it. On YellowPages.az, each business listing is neatly encapsulated within a <div> element that has the class .card-recording.

We will iterate through these card-recording elements and, for each card, collect the following information:

  • Business name
  • Address
  • Phone numbers
  • Email
  • Website
  • Rating and number of votes
data = []
for card in soup.select(".card-recording"):
    name_tag = card.select_one("h4 a")
    name = name_tag.get_text(strip=True) if name_tag else None

    address = None
    address_tag = card.select_one(".fa-map-marker")
    if address_tag:
        address_block = address_tag.find_parent("div", class_="contact-block")
        if address_block:
            address = address_block.get_text(" ", strip=True)

    phones = []
    phone_tag = card.select_one(".fa-phone")
    if phone_tag:
        phone_block = phone_tag.find_parent("div", class_="contact-block")
        if phone_block:
            phones = [p.get_text(strip=True) for p in phone_block.select("a")]

    email = None
    email_tag = card.select_one(".fa-envelope")
    if email_tag:
        email_block = email_tag.find_parent("div", class_="contact-block")
        if email_block:
            link = email_block.select_one("a[href^='mailto:']")
            if link:
                email = link.get_text(strip=True)

    website = None
    web_tag = card.select_one(".fa-desktop")
    if web_tag:
        web_block = web_tag.find_parent("div", class_="contact-block")
        if web_block:
            link = web_block.select_one("a[href^='http']")
            if link:
                website = link["href"]

    rating_val = card.select_one('[itemprop="ratingValue"]')
    rating_count = card.select_one('[itemprop="ratingCount"]')

    data.append({
        "name": name,
        "address": address,
        "phones": "; ".join(phones) if phones else None,
        "email": email,
        "website": website,
        "rating": rating_val.get_text(strip=True) if rating_val else None,
        "votes": rating_count.get_text(strip=True) if rating_count else None
    })

Step 5: Save data to CSV

After the data is extracted and stored in our data list (a list of dictionaries), the next logical step is to convert it into a structured format and export the data.

We’ll use the pandas library for this, which excels at handling tabular data and can easily export it to a CSV file.

import pandas as pd

df = pd.DataFrame(data)
df.to_csv("hotels.csv", index=False, encoding="utf-8-sig")
print("✅ hotels.csv created")

Step 6: Filtering with URL parameters

YellowPages.az, like many other web pages, uses URL query parameters to control how search results are displayed.By strategically adjusting these parameters, you can customize your search and data extraction.

Sorting Options (sort-by)

  • asc: Sorts results in alphabetical order (A → Z).
  • desc: Sorts results in reverse alphabetical order (Z → A).
  • ratings: Sorts results by their rating value.

Search Options (search-by)

  • brand: Searches specifically by business or brand name.
  • category: Searches within defined business categories.
  • numbers: Searches by phone numbers.
https://yellowpages.az/results/?search-title=OTEL&search-by=brand&sort-by=asc

This specific query will search for hotels by their brand name and then sort the results alphabetically.

Step 7: Pagination

One of the most common challenges in web scraping is dealing with pagination. YellowPages.az, like most directories, splits its search results across multiple pages.

If you only scrape the first page, you’ll miss a significant portion of the data. Proper pagination handling ensures that your scraping tool collects results from all available pages.

There are two primary methods to handle pagination on YellowPages.az:

7.1 Easy method (URL pattern)

The most straightforward approach is to observe how the page number changes in the URL.

  • Page 1: https://yellowpages.az/results/?search-title=OTEL&sort-by=ratings
  • Page 2: https://yellowpages.az/results/page/2/?search-title=OTEL&sort-by=ratings
  • Page 3: https://yellowpages.az/results/page/3/?search-title=OTEL&sort-by=ratings

This clear pattern allows your data scraper to loop through pages by simply inserting the page number into the URL. With this method, you control how many pages to scrape by adjusting the range in the for loop.

Example snippet:

for page in range(1, 6): # Scrapes pages 1 through 5
    url = f"https://yellowpages.az/results/page/{page}/?search-title=OTEL&sort-by=ratings"
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, "html.parser")
    cards = soup.select(".card-recording")
    if not cards:
        break  # Exit loop if no more business cards are found (end of results)
    # Extract data here (integrate Step 4.2 code)

7.2 DOM navigation

YellowPages.az also includes a navigation bar in its HTML, typically structured like this:

<div class="wp-pagenavi" role="navigation">
    <a class="page larger" href=".../page/3/?search-title=OTEL&sort-by=ratings">3</a>
    <a class="page larger" href=".../page/4/?search-title=OTEL&sort-by=ratings">4</a>
</div>

Example snippet with a limit:

# Initial scrape to get the first page and its pagination links
# (Assuming 'soup' is already populated from the first page)
pages = [a["href"] for a in soup.select(".wp-pagenavi a.page")]

# Process first page data here before looping through discovered pages
# ... (integrate Step 4.2 code for the initial 'soup') ...

for url in pages[:10]: # Scrapes up to 10 discovered additional pages
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, "html.parser")
    # Extract data here (integrate Step 4.2 code)

💡Conclusion

Scraping YellowPages.az with Python offers a flexible approach to systematically collecting structured business data. By combining:

  • requests to efficiently fetch HTML content,
  • BeautifulSoup to parse and extract specific details,
  • pandas to organize the extracted information into a tabular format and export results,
  • Leveraging URL parameters (search-by, sort-by) to filter and sort results according to your needs,
  • And handling pagination to capture data from all available pages.

FAQs about Yellow Page scrapers

Principal Analyst
Cem Dilmegani
Cem Dilmegani
Principal Analyst
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
View Full Profile

Comments 0

Share Your Thoughts

Your email address will not be published. All fields are required.

0/450

We follow ethical norms & our process for objectivity. AIMultiple's customers in web data scraping include Bright Data, Oxylabs, Decodo, Webshare, Coresignal, Apify, Zyte.