AIMultiple ResearchAIMultiple ResearchAIMultiple Research
We follow ethical norms & our process for objectivity.
This research is not funded by any sponsors.
Web Scraping
Updated on May 23, 2025

How to Scrape X.com (Twitter) with Python and Playwright

We used Python and Playwright to test Twitter (X) data collection methods focusing on most prominent pages:

Updated at 05-23-2025
Page TypeBest MethodReliability
ProfileUnblockerHigh
HashtagUnblockerMedium
SearchNot foundN/A

Twitter scraping methodology

We have performed all tests without logging in. Our aim is to collect public data.

Pages to be scraped

  1. User profiles to get bio text, follower numbers, and join date (i.e. date when the user joined X.com)
  2. Hashtag pages to see tweets under a specific hashtag
  3. Search result pages to find tweets based on keywords

Scraping methods

A combination of web automation tools and network configurations was used to manage dynamic content and bypass anti-bot protections:

  1. Python + Playwright (sync API): We used Playwright’s synchronous API version, suitable for JavaScript-heavy sites like Twitter.
  2. Chromium browser (non-headless mode): Twitter’s anti-bot systems are more likely to block headless sessions.
  3. Proxy configurations: We tested three network setups:
    • No proxy: We used the service with our internet connection without relying on any proxies. This is not a scalable approach.
    • Residential proxy
    • Web unblocker.

Twitter web scraping results by proxy configuration

Updated at 05-23-2025
Page typeNo proxyResidentialUnblocker
Profile pagePartial (bio & join date onlyMostly successfulSuccessful
Hashtag pageFails (empty or blocked)Fails (challenge)Fails (502 / timeout)
Search pageFailsFailsFails (cert/auth errors)

While scraping Twitter data, we tried to get data like the bio, follower count, and join date by inspecting the page’s raw HTML using methods like page.content() or checking network responses. But we noticed that some information, like follower counts, was missing. These parts of the page are loaded later with JavaScript.

We used CSS selectors to target the data from the fully loaded (rendered) page to solve this. This approach is often necessary when scraping Twitter, but not all elements are equally easy to grab. Some change more often or load slowly.

Using a tool like unblocker helped improve consistency by ensuring the page loads and behaves similarly to how it would for a real user, reducing errors and making the CSS selectors more reliable.

CSS selector reliability on profile pages

Even if your Twitter scraper gets through the platform’s proxy detection methods, it may fail because the data is no longer where you expect it to be. We tested the reliability of different selectors used to extract common profile fields:

Updated at 05-23-2025
FieldCSS SelectorReliability
Biodiv[data-testid=”UserDescription”] spanHigh
Followersa[href$=”/followers”] span spanMedium-low
Followinga[href$=”/following”] span spanMedium-low
Joined Datespan[data-testid=”UserJoinDate”]High

1. Profile page scraping

We extracted four common data fields from Twitter profile pages: bio, follower count, following count, and join date. Below is a summary of how each proxy configuration performed:

Updated at 05-23-2025
Proxy typeBioFollowersFollowingJoined date
No proxy
Residential
Unblocker

2. Hashtag page scraping

Unlike profile pages, Twitter hashtag feeds are harder to scrape reliably. When accessing these pages without a proxy, we were often redirected or faced CAPTCHA challenges. Residential proxies performed slightly better, allowing the page to begin loading, but they still failed to deliver usable data consistently.

The only method that successfully rendered and accessed the hashtag feed was the unblocker.

Updated at 05-23-2025
Proxy typePage loadData access
No proxy
Residential⚠️*
Unblocker

*Represents a partial result. The hashtag page began loading, but the tweet data didn’t appear.

3. Search page scraping

Search result pages on Twitter are the most difficult to access via web scraping, as they include rate-limiting, SSL certificate checks, and dynamic behavior that blocks web scraping tools.

  • Connections without proxies failed.
  • Residential proxies had slightly better page load performance, but the data rarely became available. Attempts often ended in timeouts or HTTP 502 errors.
  • Even unblocker, which successfully handled other Twitter endpoints, failed here. It triggered certificate errors (like ERR_CERT_AUTHORITY_INVALID) and inconsistent server responses.
Updated at 05-23-2025
Proxy typePage loadData access
No Proxy
Residential⚠️
Unblocker

Technical challenges in Twitter web scraping

We encountered various recurring errors during our scraping tests, especially when accessing more protected pages like search results. These issues often stemmed from SSL problems, slow-loading pages, or aggressive anti-bot protections on Twitter.

Here is a breakdown of the most common errors and what likely caused them:

  • ERR_CERT_AUTHORITY_INVALID: This error points to SSL certificate issues, which are often caused by misconfigured proxies or when using advanced proxy tools like Unblocker, which inject their certificates.
  • Timeout 30000ms exceeded: A common Playwright error indicating that the page took too long to load. This typically happens due to heavy JavaScript rendering or slow proxy connections that delay full page hydration.
  • 502 Server Error or read timeout: These errors suggest the server blocked or dropped the request, especially when accessing search pages. Twitter may be actively denying access to automated traffic.
  • Event loop is closed!: This is a Playwright or asyncio-related error that usually occurs after a crash, an abrupt disconnection, or an incomplete async response. It often requires resetting the browser context or reinitializing the session.

Twitter web scraping findings

While specific errors were more technical (e.g., timeouts, SSL issues), some challenges were inherent to each Twitter page type’s structure and protection level. After testing multiple proxy setups, we found clear differences in reliability across profile, hashtag, and search pages:

  • Unblocker is the most reliable option for scraping profile pages, especially when retrieving follower/following counts.
  • Hashtag and search result pages remain difficult to scrape with any method tested.
  • Running a visible (headful) browser helps reduce bot detection, but it is not enough.
  • JavaScript rendering and hydration delays (i.e., the time it takes for data to appear on the page fully) significantly affect scraping accuracy.

FAQ about web scraping Twitter

What is Twitter data?

When we think of Twitter, we can all imagine a feed full of tweets back to back, with numbers of likes and specific owners of the tweets. However, you can get more details from Twitter.
Keywords/hashtags: You can pull a certain number of tweets that contain a specific keyword or hashtag, or combinations of them. You can curate your search by limiting the tweets to a certain number of likes or dates to narrow down your data to a particular event or power of influence.
Tweets: You can pull all the tweets of specified profiles, again with the ability to filter your tweet data into specific tweets of these individuals, such as tweets that contained a URL or tweets that got retweeted.
Profiles: You can collect all the information about a Twitter user’s public account. Anything you see on their page, such as their bio, number of followers, or tweets, will be reported in a structured format along with the profile owner.

Though this is not legal advice, in most jurisdictions, scraping publicly available data (e.g., anything you can see without logging into the website) from Twitter is legal and allowed.
For example, if a user’s profile is private, even if you follow this person and can access their profile, you can’t scrape, share, or use this data for any purpose. That being said, being scraped in general is not desired by websites like Twitter because it brings excess traffic to their website and reduces the scarcity of their data. Therefore, they try to block web scrapers.

Is web scraping better than the Twitter API?

Paid API to retrieve tweets
This API is more expensive (i.e., Pro level with read access starts at $5k/month) than other options.1
The most significant advantage of the API is that, since Twitter supports it, there is no risk of being blocked if you pull the data by following their API guidelines. However, API has certain limitations regarding how far back in the past you can pull data and how many tweets you can pull in a minute. These rules can change year by year and should be double-checked directly from Twitter’s most up-to-date guidelines.
Free write-only API for developers
Twitter provides free API access for write-only use cases. 1 register your use case at the Twitter Developer website. If your use case is confirmed, they will share your API key in a few days.
You need to register your use case at the Twitter Developer website. If your use case is confirmed, they will share your API key within a few days.

Share This Article
MailLinkedinX
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Next to Read

Comments

Your email address will not be published. All fields are required.

1 Comments
Jones
Sep 20, 2023 at 12:10

You cannot access tweets for free using the API. Twitter (X) charges developers at minimum $100/month to use the API to access tweets. The free developer option is limited to posting only, which is not what you’d want to scrape Twitter for anyway.

Cem Dilmegani
Nov 01, 2023 at 17:31

Indeed, we updated that section, thank you for the heads up!

Related research