Scraping real estate data is more complicated than it looks. Zillow quickly blocks bots with PerimeterX, while Redfin requires piecing property details from scattered DOM elements. Every platform employs its own defenses, making reliable extraction a challenge without the use of proxies or APIs.
This guide provides:
- A Python tutorial showing how to scrape Redfin with Selenium and save listings to CSV
- Quick answers with APIs and tools for less-technical users
You’ll also learn when to choose DIY code vs APIs, and what challenges to expect.
How to build a real estate web scraper (Python + Selenium)
Our initial attempts with Zillow were immediately blocked with 403 Forbidden responses due to PerimeterX anti-bot detection. Multiple approaches with different headers and delays all resulted in complete access denial.
We pivoted to Redfin, which proved more accessible with 200 OK responses using Selenium and proper timing. However, Redfin uses a component-based architecture where property data is fragmented across multiple DOM elements, creating assembly challenges.
Note: This tutorial uses no API keys or proxy services, a basic scraping approach only.
1. Browser setup and configuration
The setup utilizes headless Chrome with essential arguments to ensure stability. The realistic user agent helps avoid basic bot detection.
Page load timeout is set to 30 seconds to handle slow-loading JavaScript content. Window size is set to standard desktop resolution to ensure proper content rendering.
Here’s the basic setup for our scraper:
import re
import re
import csv
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
class RedfinBoulderScraper:
def __init__(self):
self.driver = None
self.setup_driver()
def setup_driver(self):
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--user-agent=Mozilla/5.0')
self.driver = webdriver.Chrome(options=chrome_options)
self.driver.set_page_load_timeout(30)
self.driver.set_window_size(1920, 1080)
2. Extracting property listings from Redfin
The real estate scraper targets Boulder ZIP code 80302 and waits 5 seconds for JavaScript to load. It identifies elements with “Home” or “Property” in their class names to capture most listings.
Filtering removes empty or navigation elements, while duplicates are prevented through address comparison. Pagination is handled by scrolling, with a 4-second delay between loads to fetch additional listings.
def scrape_boulder(self, max_listings=30):
url = "https://www.redfin.com/zipcode/80302"
self.driver.get(url)
time.sleep(5)
listings, seen = [], set()
while len(listings) < max_listings:
elements = self.driver.find_elements(By.CSS_SELECTOR, '[class*="Home"], [class*="Property"]')
for element in elements:
if len(listings) >= max_listings:
break
try:
text = element.text.strip()
if not text or len(text) < 20:
continue
if "homes for sale" in text.lower() or "real estate" in text.lower():
continue
listing = self.extract_listing_data(text, len(listings)+1)
if not listing['address']:
continue
if listing['address'] in seen:
continue
seen.add(listing['address'])
listings.append(listing)
except:
continue
if len(listings) < max_listings:
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(4)
return listings
3. Data extraction methods
Each field is extracted with regex patterns applied to the listing text:
- Price → detects dollar signs followed by numbers (handles comma formatting).
- Bedrooms & Bathrooms → works with different text variations and abbreviations.
- Square footage → captures multiple formats and removes commas for clean numeric values.
This step ensures that every listing has structured fields (price, beds, baths, sqft) ready for analysis or export.
def extract_listing_data(self, text, index):
return {
'listing_number': index,
'price': self.extract_price(text),
'address': self.extract_address(text),
'beds': self.extract_beds(text),
'baths': self.extract_baths(text),
'sqft': self.extract_sqft(text),
'features': self.extract_features(text)
}
def extract_price(self, text):
match = re.search(r'\$[\d,]+', text)
return match.group(0) if match else ''
def extract_beds(self, text):
match = re.search(r'(\d+)\s*(?:bed|br|bedroom)', text, re.IGNORECASE)
return match.group(1) if match else ''
def extract_baths(self, text):
match = re.search(r'(\d+(?:\.\d+)?)\s*(?:bath|ba)', text, re.IGNORECASE)
return match.group(1) if match else ''
def extract_sqft(self, text):
match = re.search(r'([\d,]+)\s*(?:sq\.?\s*ft|sqft|sf)', text, re.IGNORECASE)
return match.group(1).replace(',', '') if match else ''
4. Address and feature extraction
Address extraction scans each text line for numbers followed by common street suffixes (e.g., St, Ave, Rd, Blvd, Unit), returning the most likely match as the property address.
Feature extraction searches the listing text for predefined amenity keywords (balcony, garage, pool, etc.), formats them, and limits results to five features to keep strings concise.
def extract_address(self, text):
for line in text.split("\n"):
if re.search(r'\d+\s+.*(St|Ave|Rd|Dr|Ln|Blvd|Way|Ct|Pl|Trail|Unit|Cir)', line, re.IGNORECASE):
return line.strip()
return ''
def extract_features(self, text):
features = ['balcony', 'garage', 'pool', 'fireplace', 'hardwood', 'renovated', 'mountain view']
found = [f.title() for f in features if f in text.lower()]
return ', '.join(found[:5]) if found else ''
5. Data storage
Once listings are extracted, the results are exported to a CSV file for further use. Each row contains the listing number, price, address, beds, baths, square footage, and detected features.
The scraper also includes a cleanup method to close the browser instance and free system resources once the process is complete.
def save_csv(self, listings, filename="boulder_listings.csv"):
if not listings:
return
with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['listing_number','price','address','beds','baths','sqft','features']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(listings)
def close(self):
if self.driver:
self.driver.quit()
def main():
scraper = RedfinBoulderScraper()
listings = scraper.scrape_boulder(max_listings=30)
scraper.save_csv(listings, "boulder_listings.csv")
scraper.close()
Challenges and limitations of DIY real estate scraping
Redfin-specific issues
Redfin uses a component-based architecture, which splits property details across multiple DOM elements. Performance also varies by geography: urban ZIP codes often time out, while suburban ones load more reliably. The site also utilizes progressive loading, where containers appear first, and data fills in asynchronously.
Broader web scraping challenges
Modern real estate websites are JavaScript-heavy and protected by sophisticated anti-bot systems. Techniques such as behavioral analysis, fingerprinting, and geo-blocking render traditional scrapers unreliable.
At scale, serious scraping usually requires rotating IPs, browser fingerprint management, or API-based solutions.
Limitations of DIY (do-it-yourself) real estate scraper
- Manual ZIP code input: each location must be entered individually
- No URL/Property ID extraction: only visible text data is captured
- Fragile performance: site changes can easily break the scraper
Alternative approaches
For production-scale real estate data scraping:
- Official APIs: e.g., Zillow/Redfin APIs (where available)
- Professional scraping services: e.g., Bright Data, Oxylabs
- MLS data sources: direct access to Multiple Listing Service databases
- Specialized real estate data providers: APIs such as RentSpree or PadMapper
Real estate scraper APIs for enterprise-scale operations
If you don’t want to write code or maintain scrapers, there are easier ways to access real estate data through scraper APIs. For example, Bright Data provides dedicated scrapers for Zillow with different data points, such as:
- Property listings by URL: collect details from individual Zillow listing pages
- Listings by search filters: extract results filtered by location, home type, or listing status
- Full property information: pull complete records including property price, address, size, and features
- Price history: gather historical pricing data for specific properties
- Search results by URL: capture multiple properties directly from Zillow search pages

For enterprise web data requirements, these scraping solutions provide:
- Scalable infrastructure with proxy rotation, retries, and scheduling built in.
- SLA-backed reliability, making it suitable for production pipelines.
- Compliance support through licensed APIs and professional data providers.
FAQs about real estate scraping

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Comments 0
Share Your Thoughts
Your email address will not be published. All fields are required.