AIMultipleAIMultiple
No results found.

How to Scrape Data from Twitter (X.com) with Python (No Twitter API)

Cem Dilmegani
Cem Dilmegani
updated on Nov 5, 2025

Modern social platforms, such as X.com, employ strict anti-scraping defenses, including CAPTCHA, rate limits, and IP blocking. That’s why this guide uses the Twitter scraper API, which enables reliable and compliant scrape Twitter data operations by managing proxy rotation and ethical data collection.

In this article, you’ll learn how to scrape data from Twitter by keyword, collect URLs via Google Search, and extract detailed post-level data.

Build a Twitter profile scraper (public profiles, no API)

You can reuse the exact 4-step flow to scrape public profile data, such as bio, follower counts, post cadence, and verification status, without the official API.

How to adapt your pipeline:

  1. Discover profile URLs with Google:
    site:x.com inurl:/status/ (for posts) → switch to
    site:x.com -inurl:/status “profile_keyword” or search site:x.com “@handle” to collect profile pages.
  2. Collect with a paid tool or your headless script, and maintain the 2-second Google delay.
  3. Poll every 10 seconds (with a 15-minute cap) and download the NDJSON.
  4. Export CSV. Use fields like user_posted, name, followers, posts_count, is_verified, profile_image_link, biography, user_id.

What you get:
A clean dataset to rank creators by influencer score = normalized engagement × log10(followers). This answers “who to watch” for Twitter web scraping workflows and powers outreach lists for your Twitter scraper dashboards.

How to scrape Twitter data using Python

Step 1: Set up your environment for Twitter web scraping

Before you start scraping Twitter data, you’ll need to prepare your Python environment.

In this step, you’ll import the necessary libraries, add your API credentials (we used the Bright Data Twitter scraper API), configure a proxy, and define your search parameters.

You’re preparing your workspace so your Twitter scraping Python script can run smoothly and connect to the scraper.

  • Import the libraries you’ll use for requests, data parsing, and saving results.
  • Add your credentials, you’ll find the API token and dataset ID in your dashboard.
  • Configure a proxy to route your traffic safely and avoid IP blocks while web scraping Twitter content.
  • Set your keyword and limit. In this example, you’re tracking “AI agent optimizing” and collecting five posts, but you can raise NUM_POSTS to expand your Twitter data scraping scope.

Step 2: Find X post URLs to scrape

In this step, you’ll use Google search to collect public X post (tweet) links that match your keyword. This simple trick lets you scrape Twitter without API access by querying only X/Twitter URLs.

This script constructs a Google query, such as ‘site:x.com OR site:twitter.com <keyword>’, to return only X/Twitter posts. It extracts tweet URLs, cleans them, converts old twitter.com links to x.com, and removes duplicates.

A 2-second delay is included between requests to respect Google’s servers while collecting enough unique URLs for your Twitter data scraping workflow.

Step 3: Trigger Twitter scraping

Send the collected URLs to the scraper.

Once we have collected all the X post URLs, we need to send them to the web scraper for data extraction. This section makes a POST request to Bright Data’s trigger endpoint with our authentication token and dataset ID. The same method that many Twitter web scraping pipelines use when managing external data collection.

The URLs are formatted as a list of JSON objects, with each object containing a single post URL. When the API receives this request successfully, it returns a snapshot ID, which acts as a reference for this particular scraping job.

If the API call fails for any reason, the script exits with an error message. This step forms the foundation of Twitter data scraping, a scalable and compliant approach for anyone learning how to safely and efficiently scrape Twitter data without relying on the official API.

Step 4: Full code and save the scraped X.com data

The final section waits for the scraper to finish and then retrieves the results for your Twitter web scraping workflow. Because scraping can take time, your script polls the snapshot status every 10 seconds with a 15-minute timeout. When the status becomes “ready” or “done,” it downloads the dataset via the provided URL.

The response arrives as NDJSON, so each line is parsed into a Python dictionary. After all data is collected, the script prints each post’s URL, description, and engagement metrics (likes, views, reposts, replies, hashtags). Finally, everything is organized into a pandas DataFrame and exported to CSV for reporting or modeling.

The try/except blocks ensure numeric fields are converted safely (handling unexpected formats), which makes this approach reliable for scrape Twitter data pipelines and tutorials on how to scrape Twitter data without the official API.

Benchmark: Performance & reliability (paid tool vs open-source)

If you’re serious about Twitter web scraping at scale, measure throughput, success rate, and maintenance time.

We ran three configs with the same topics:

  1. A paid tool (managed scraping provider)
  2. SN-Scraper (open-source)
  3. A custom headless browser script. Each collected public posts, parsed the engagement, and saved the data to a CSV.

What we observed:

  • Throughput (tweets/min): paid Twitter scrapers > headless browser > SN-Scraper.
  • Success rate: The paid tool handled layout/auth changes most consistently.
  • Engineering time: open-source options needed the most patching after site changes.

Takeaway: For one-off research, open-source is a great option. For ongoing Twitter data scraping, paid web scraping tools can reduce breakage and hidden costs, especially when you need to scrape Twitter data continuously or across many topics.

Politeness and anti-block playbook

The following points stabilize your Twitter scraping Python runs and reduce blocks.

  • Pacing: Maintain a 2-second delay in Google discovery and gradually increase the timeout duration (10→20→40s) on subsequent timeouts.
  • Rotate identities: Use rotating IPs/user-agents (a paid tool usually automates this) for scraping Twitter data at scale.
  • Limit concurrency: Start 3–5 workers; increase only if error rate stays low.
  • Cache & dedupe: Don’t re-fetch the same post; store IDs and the last-seen timestamp.
  • Distribute schedules: Spread runs across the day.

How to choose the best way to get Twitter data

  • Need predictable success & low maintenance?
    • Choose a paid tool. It’s the most resilient for ongoing scrape Twitter data and multi-topic monitoring.
  • Need structured, governed access?
    • If budgets and limits are OK, the Official API is the cleanest.
  • Just exploring? Small budget?
  • Have unique requirements (logins, sequencing, dynamic actions)?
    • Build a DIY headless with solid proxy hygiene and observability.

Use this comparison to pick what matches your budget, timelines, and risk tolerance for scraping tweets.

Turn this pipeline into a Twitter aggregator (scheduling + dashboards)

Once your Python Twitter scraper is running, you can easily evolve it into a Twitter aggregator that continuously collects and visualizes public X.com posts around specific topics, hashtags, or influencers. An aggregator is simply an automated system that:

  • Collects posts from multiple sources or keywords
  • Cleans and stores the data regularly (hourly or daily)
  • Displays insights in a dashboard for quick analysis

Your 4-step tutorial already performs all the core functions, discovery, scraping, and export, which makes it a suitable foundation for an automated aggregator.

How to build your Twitter aggregator

  1. Schedule regular runs: Use a cron job or workflow scheduler to run your script automatically (e.g., every hour). Rotate through a list of topics or hashtags each time.
  2. Dedupe and append new data: After each run, check for duplicates using URL or ID and append only fresh posts to your CSV or database. Organize results by day (/data/x_posts/YYYY-MM-DD/) so they’re easy to query later.
  3. Transform for dashboards: Load your CSVs into Google Data Studio, Tableau, or Python notebooks to visualize:
    • Volume of posts per hour/day
    • Top authors or hashtags
    • Engagement trends (likes, views, reposts)

Use Query patterns as a Twitter finder (people & posts)

Your discovery step can do more than find posts. It can help you find people, influencers, and key accounts on X.com using Google search operators. This makes your scraper double as a Twitter finder for both user profiles and topic-related tweets.

What is a Twitter finder?

A Twitter finder is a search workflow that identifies:

  • People or profiles based on job title, bio, or industry keywords
  • Tweets or posts based on specific topics, hashtags, or timeframes

You’ll still rely on Google’s site:x.com operator to uncover public pages that match your keywords, without requiring the Twitter API.

Query patterns to find profiles:

These patterns help you collect author pages (not tweets). Feed those URLs into your scraper to extract fields like user_posted, name, followers, is_verified, and biography. To locate profiles, try:

This transforms your project into a straightforward Twitter profile scraper, ideal for influencer discovery, recruitment, or marketing research.

Query patterns to find posts:

To focus on tweets or posts, use:

These techniques enhance both recall (by displaying more relevant tweets) and precision (by reducing the number of irrelevant results). By applying these query tricks, your scraper becomes an accurate Twitter finder.

Troubleshoot missing previews with a Twitter debugger (metadata tips)

Searchers of Twitter debugger often want to fix link previews (Twitter Cards/Open Graph). While it’s not scraping, it’s adjacent and helpful.

  • Ensure target pages include og:title, og:description, og:image, and correct twitter:card meta.
  • Verify images are HTTPS, accessible, and within size limits.
  • Re-share after metadata updates.

💡Conclusion

You now have a repeatable way to scrape Twitter data that balances speed, compliance, and maintainability:

  • Use Google Discovery with a 2-second delay to find high-signal URLs (avoid chasing fragile endpoints).
  • Offload rendering, rotation, and anti-bot to a paid tool for steady results.
  • Poll, download NDJSON, and export CSV with clean numeric fields so dashboards, notebooks, or apps “just work.”
  • Add small touches (dedupe, retries, backoff) and your Twitter scraping Python job stays boring (in a good way).

If you need an API-free starting point, this workflow is the sweet spot. If you need governed access or deeper historical pulls later, layer in the official API. Either way, you’re shipping insights now with a pipeline that’s simple to run and easy to scale for web scraping Twitter projects.

FAQs about Twitter web scraping

Principal Analyst
Cem Dilmegani
Cem Dilmegani
Principal Analyst
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
View Full Profile

Comments 1

Share Your Thoughts

Your email address will not be published. All fields are required.

0/450
Jones
Jones
Sep 20, 2023 at 12:10

You cannot access tweets for free using the API. Twitter (X) charges developers at minimum $100/month to use the API to access tweets. The free developer option is limited to posting only, which is not what you'd want to scrape Twitter for anyway.

Cem Dilmegani
Cem Dilmegani
Nov 01, 2023 at 17:31

Indeed, we updated that section, thank you for the heads up!