Yellow pages provide easy access to a variety of services/businesses, which may not all show up in your Google search. Search engines report results based on relevance to the search term, whereas online yellow pages show results based on geographic areas. If you are looking to get a better local reach for your business, it can be helpful to advertise in online yellow pages for your region.
This article will guide you through the process of building your own web scraper to collect data from Yellow Pages directories.
How to scrape data from Yellow Pages with Python
Step 1: Install required libraries
Before we begin, you need to install the necessary Python libraries. Open your terminal or command prompt and run the following command:
pip install requests beautifulsoup4 pandas
This command installs requests for making HTTP requests, BeautifulSoup for parsing HTML, and pandas for data manipulation and export.
Step 2: Define the target URL
The first step in any scraping project is to identify the target URL. For YellowPages.az, the URLs are quite straightforward.
For example, searching for “HOTEL” and sorting the results by ratings would use this URL:
https://yellowpages.az/results/?search-title=OTEL&sort-by=ratings
Here:
- search-title=OTEL defines the keyword we are searching for.
- sort-by=ratings sorts the results based on their ratings.
Step 3: Fetch HTML content
Once you have your target URL, the next step is to download its HTML content. The requests library in Python is perfect for this. To ensure our request mimics a real browser and avoids potential blocking, we’ll also include a User-Agent header.
import requests
from bs4 import BeautifulSoup
url = "https://yellowpages.az/results/?search-title=OTEL&sort-by=ratings"
headers = {"User-Agent": "Mozilla/5.0"}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
- r.text contains the full HTML source code of the fetched page.
- BeautifulSoup(r.text, “html.parser”) transforms this raw HTML into a structured, navigable object. This allows us to easily search for and extract specific elements like business names, addresses, or phone numbers.
Step 4: Extract business details
4.1 Locating data with the inspect tool
Before writing any extraction code, we need to understand how the data is organized within the website’s HTML. This crucial step is performed using your browser’s “Inspect” tool (e.g., Chrome DevTools, Firefox Developer Tools).

4.2 Extracting the Data with BeautifulSoup
With the knowledge of where each data point resides, we can now programmatically extract it. On YellowPages.az, each business listing is neatly encapsulated within a <div> element that has the class .card-recording.
We will iterate through these card-recording elements and, for each card, collect the following information:
- Business name
- Address
- Phone numbers
- Website
- Rating and number of votes
data = []
for card in soup.select(".card-recording"):
name_tag = card.select_one("h4 a")
name = name_tag.get_text(strip=True) if name_tag else None
address = None
address_tag = card.select_one(".fa-map-marker")
if address_tag:
address_block = address_tag.find_parent("div", class_="contact-block")
if address_block:
address = address_block.get_text(" ", strip=True)
phones = []
phone_tag = card.select_one(".fa-phone")
if phone_tag:
phone_block = phone_tag.find_parent("div", class_="contact-block")
if phone_block:
phones = [p.get_text(strip=True) for p in phone_block.select("a")]
email = None
email_tag = card.select_one(".fa-envelope")
if email_tag:
email_block = email_tag.find_parent("div", class_="contact-block")
if email_block:
link = email_block.select_one("a[href^='mailto:']")
if link:
email = link.get_text(strip=True)
website = None
web_tag = card.select_one(".fa-desktop")
if web_tag:
web_block = web_tag.find_parent("div", class_="contact-block")
if web_block:
link = web_block.select_one("a[href^='http']")
if link:
website = link["href"]
rating_val = card.select_one('[itemprop="ratingValue"]')
rating_count = card.select_one('[itemprop="ratingCount"]')
data.append({
"name": name,
"address": address,
"phones": "; ".join(phones) if phones else None,
"email": email,
"website": website,
"rating": rating_val.get_text(strip=True) if rating_val else None,
"votes": rating_count.get_text(strip=True) if rating_count else None
})
Step 5: Save data to CSV
After the data is extracted and stored in our data list (a list of dictionaries), the next logical step is to convert it into a structured format and export the data.
We’ll use the pandas library for this, which excels at handling tabular data and can easily export it to a CSV file.
import pandas as pd
df = pd.DataFrame(data)
df.to_csv("hotels.csv", index=False, encoding="utf-8-sig")
print("✅ hotels.csv created")
Step 6: Filtering with URL parameters
YellowPages.az, like many other web pages, uses URL query parameters to control how search results are displayed.By strategically adjusting these parameters, you can customize your search and data extraction.
Sorting Options (sort-by)
- asc: Sorts results in alphabetical order (A → Z).
- desc: Sorts results in reverse alphabetical order (Z → A).
- ratings: Sorts results by their rating value.
Search Options (search-by)
- brand: Searches specifically by business or brand name.
- category: Searches within defined business categories.
- numbers: Searches by phone numbers.
https://yellowpages.az/results/?search-title=OTEL&search-by=brand&sort-by=asc
This specific query will search for hotels by their brand name and then sort the results alphabetically.
Step 7: Pagination
One of the most common challenges in web scraping is dealing with pagination. YellowPages.az, like most directories, splits its search results across multiple pages.
If you only scrape the first page, you’ll miss a significant portion of the data. Proper pagination handling ensures that your scraping tool collects results from all available pages.
There are two primary methods to handle pagination on YellowPages.az:
7.1 Easy method (URL pattern)
The most straightforward approach is to observe how the page number changes in the URL.
- Page 1: https://yellowpages.az/results/?search-title=OTEL&sort-by=ratings
- Page 2: https://yellowpages.az/results/page/2/?search-title=OTEL&sort-by=ratings
- Page 3: https://yellowpages.az/results/page/3/?search-title=OTEL&sort-by=ratings
This clear pattern allows your data scraper to loop through pages by simply inserting the page number into the URL. With this method, you control how many pages to scrape by adjusting the range in the for loop.
Example snippet:
for page in range(1, 6): # Scrapes pages 1 through 5
url = f"https://yellowpages.az/results/page/{page}/?search-title=OTEL&sort-by=ratings"
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
cards = soup.select(".card-recording")
if not cards:
break # Exit loop if no more business cards are found (end of results)
# Extract data here (integrate Step 4.2 code)
7.2 DOM navigation
YellowPages.az also includes a navigation bar in its HTML, typically structured like this:
<div class="wp-pagenavi" role="navigation">
<a class="page larger" href=".../page/3/?search-title=OTEL&sort-by=ratings">3</a>
<a class="page larger" href=".../page/4/?search-title=OTEL&sort-by=ratings">4</a>
</div>
Example snippet with a limit:
# Initial scrape to get the first page and its pagination links
# (Assuming 'soup' is already populated from the first page)
pages = [a["href"] for a in soup.select(".wp-pagenavi a.page")]
# Process first page data here before looping through discovered pages
# ... (integrate Step 4.2 code for the initial 'soup') ...
for url in pages[:10]: # Scrapes up to 10 discovered additional pages
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
# Extract data here (integrate Step 4.2 code)
💡Conclusion
Scraping YellowPages.az with Python offers a flexible approach to systematically collecting structured business data. By combining:
- requests to efficiently fetch HTML content,
- BeautifulSoup to parse and extract specific details,
- pandas to organize the extracted information into a tabular format and export results,
- Leveraging URL parameters (search-by, sort-by) to filter and sort results according to your needs,
- And handling pagination to capture data from all available pages.
FAQs about Yellow Page scrapers

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Comments 0
Share Your Thoughts
Your email address will not be published. All fields are required.