AIMultiple ResearchAIMultiple ResearchAIMultiple Research
We follow ethical norms & our process for objectivity.
This research is not funded by any sponsors.
Web Scraping
Updated on Apr 18, 2025

Playwright vs Puppeteer in 2025: Scraping & Automation

Playwright and Puppeteer are the most powerful open-source tools for controlling headless browsers. The main difference between these tools lie in cross-browser support and feature richness. Playwright supports multiple browser engines, on the other hand, Puppeteer is primarily focused on Chromium-based browsers and offers a more straightforward experience.

Explore the key differences and similarities between Playwright and Puppeteer:

Main differences between Playwright and Puppeteer

Playwright and Puppeteer are both open-source Node.js libraries commonly used for web automation tasks and web scraping. Both tools support controlling headless browsers, automation via DevTools, and provide APIs for page and element interaction.

Last Updated at 04-18-2025
FeaturesPlaywrightPuppeteer

Maintainer

Microsoft

Google (Chrome team)

Browser Support

Chromium (Chrome, Edge), Firefox, and WebKit (Safari)

Primarily Chromium, limited Firefox support

Programming Languages

JavaScript/TypeScript, Python, Java, C# (official)

JavaScript/TypeScript (official); unofficial wrappers

Cross-browser Testing

Limited (mostly Chromium-focused)

Mobile Browser Emulation

Native support for Chrome Android & Mobile Safari

Primarily Chrome Android emulation

Community & Ecosystem

Rapidly growing but newer

Larger, more mature ecosystem

GitHub statistics (April, 2025)

71.8k stars, 4.1k forks

90.4k stars, 9.2k forks

What is Puppeteer?

Puppeteer is an open-source Node.js library that a user-friendly API to control headless Chrome or Chromium browsers over the DevTools Protocol or WebDriver BiDi.

Puppeteer allows automation testing of Chrome Extensions for performance testing. Users can capture precise screenshots of entire pages or specific UI components.

Advantages of Puppeteer

  • Since Puppeteer is developed maintained by Google, the tool quickly integrates the latest Chrome developments.
  • Cross-browser support is limited. Runs Chrome/Chromium in headless mode by default.
  • Offers full control over Chrome’s features including clicking buttons, form submission, scrolling, and taking screenshots.
  • For Chrome-only tasks, Puppeteer is slightly faster than Playwright.

Disadvantages of Puppeteer

  • Puppeteer does not support other browsers like Safari or Microsoft Edge.
  • The primary language Puppeteer supports is JavaScript (and TypeScript via typings).
  • Puppeteer is tightly coupled with specific versions of Chromium or Firefox. If you want to test on older browser versions, you need to manage the browser binary manually.

What is Playwright?

Playwright is an open-source, cross-browser automation and testing library developed by Microsoft. The tool enables developers developers to interact all major browsers like Chromium (Chrome, Edge), Firefox, and WebKit (Safari).

Playwright allows capturing screenshots of entire pages or specific elements, generating PDFs of pages, and recording videos of test sessions.

Advantages of Playwright

  • Cross-browser and cross-language support: Playwright is available in multiple browsers and supports multiple programming languagesincluding Python, .NET, JavaScript and TypeScript.
  • Built-in cross-browser testing: Developers can use the same scripts and tests across all supported browsers both in visible (headed) and headless modes.
  • Native mobile app testing of Chrome for Android and Mobile Safari: Includes predefined device profiles for common mobile devices.
  • Built-in auto-wait: Auto-wait mechanisms ensure that elements become actionable before interactions occur.

Disadvantages of Playwright

  • PDF Generation Limitation: Only supported on headless Chromium. Firefox and WebKit do not support PDF generation currently.
  • Resource intensive: Launching multiple browsers can consume memory and CPU resources.
  • Less mature ecosystem relative to Puppeteer (offering extensive community support): While Playwright has quickly grown in popularity (initially released in early 2020), the tool still require more community engagement.

Automating News Headline Scraping with Playwright

In this example, we will

  • Navigate to BBC News
  • Grab the top 5 headlines
  • Save them into a .txt file

Step 1: Install Node.js (if you haven’t already)

Check if Node.js is installed. Open your terminal (or command prompt) and type node -v. If a version number shows up something like e.g. v22.11.0, you’re all set. If you get an error like “command not found”. Go to https://nodejs.org and download the LTS version.

node -v

Step 2: Create a Project Folder

Create a new folder (directory) for the project:

mkdir ~/Desktop/news-scraper

You can enterthat folder in your terminal by running this command:

cd ~/Desktop/news-scraper

Step 3: Initialize the Project

The following command creates a file called package.json in your folder. It will track your project’s dependencies.

npm init -y

After running the command a file called package.json was created, including its name and version:

Step 4: Install Playwright

npm install playwright

We need to install the browser binaries, the following command will download Chromium, Firefox, and WebKit:

npx playwright install

Step 5: Create the Automation Script

After installing Playwright and the browser binaries, we need to write the first script that opens a browser, visits a website, collects headlines, and saves them to a file.

  1. Create a new file named scrape.js, this will create an empty file:
touch scrape.js
  1. Open the file:
nano scrape.js

The following image shows the nano text editor after you running the command, it is ready typing or pasting your JavaScript code into the file scrape.js.

  1. Paste this code into the nano window:
const { chromium } = require('playwright');
const fs = require('fs'); // Import fs module to save data

(async () => {
  const browser = await chromium.launch(); // Launch the browser
  const page = await browser.newPage();   // Open a new page
  await page.goto('https://www.bbc.com/news'); // Navigate to BBC News

  // Scrape the first 5 headlines
  const headlines = await page.$$eval('.gs-c-promo-heading__title', elements =>
    elements.slice(0, 5).map(el => el.innerText.trim()) // Extract text of the top 5 headlines
  );

  // Save the headlines to a .txt file
  fs.writeFileSync('headlines.txt', headlines.join('\n'), 'utf8');
  console.log('✅ Headlines saved to headlines.txt');

  await browser.close(); // Close the browser
})();
  1. Press Ctrl + O, and then enter to save the file.
  2. After saving the file, press Ctrl + X to exit nano
  3. Navigate to your project directory:
cd ~/Desktop/news-scraper
  1. Run the following command in the terminal to to scrape the headlines from BBC News:
node scrape.js

Step 6: Set up the headline scraper

The scraper will navigate to the BBC News website, scrape the top headlines and save the scraped headlines into a .txt file.

  1. Open the test.js file for modifying the script:
nano test.js
  1. Update the code with the following:
const { chromium } = require('playwright');
const fs = require('fs');

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto('https://www.bbc.com/news');

  // Scrape the first 5 headlines
  const headlines = await page.$$eval('.gs-c-promo-heading__title', elements =>
    elements.slice(0, 5).map(el => el.innerText.trim())
  );

  // Save the headlines to a .txt file
  fs.writeFileSync('headlines.txt', headlines.join('\n'), 'utf8');
  console.log('✅ Headlines saved to headlines.txt');

  await browser.close();
})();
  1. Press Ctrl + O to save the file, and then enter to confirm the filename.
  2. Press Ctrl + X to exit.
  3. Run the web scraping script:
node test.js
  1. You will see:
✅ Headlines saved to headlines.txt
  1. List all files in the current directory:
ls
  1. Check the headlines.txt file:
cat headlines.txt

See the top 5 headlines printed out in the terminal:

Step 7: Save the extracted data as a text file (option 1)

  1. In the terminal, run to find the headlines.txt file:
ls
  1. You can copy the file to another location by using the following command:
cp headlines.txt ~/Desktop

Export to CSV (option 2)

  1. Install the csv-writer package
npm install csv-writer
  1. Modify the script with the following code for CSV Export
const { chromium } = require('playwright');
const fs = require('fs'); // For saving to .txt file
const { createObjectCsvWriter } = require('csv-writer'); // For CSV export

(async () => {
  const browser = await chromium.launch(); // Launch browser
  const page = await browser.newPage();   // Open a new page
  await page.goto('https://www.bbc.com/news'); // Navigate to BBC News

  // Extract the top 5 headlines
  const headlines = await page.$$eval('.sc-87075214-3', (elements) => {
    return elements.slice(0, 5).map(el => el.innerText.trim());
  });

  // Log the extracted headlines
  console.log('Extracted Headlines:', headlines);

  // Set up the CSV writer
  const csvWriter = createObjectCsvWriter({
    path: 'headlines.csv',
    header: [
      { id: 'headline', title: 'Headline' }
    ]
  });

  // Write the headlines to the CSV file
  const records = headlines.map(headline => ({ headline }));
  await csvWriter.writeRecords(records);
  console.log('✅ Headlines saved to headlines.csv');

  await browser.close(); // Close the browser
})();
  1. Run the script again:
node scrape.js

The CSV file should look like this:

Troubleshooting

Selector Not Found

The following response shows that the script ran, but it didn’t extract any headlines from the target. The class used in the code may not be correct anymore since websites change their structure frequently.

How to fix the issue:

The $$eval function in Playwright uses CSS selectors that enables users to identify and extract the elements. You need to inspect the page, right-click on one of the elements and then select inspect. In the Developer Tools, look at the HTML structure and identify the class name for the element. With the new selector, update your script with the new class.

Extracted Headlines: []
✅ Headlines saved to headlines.txt

Combining Scraping and Automation in One Puppeteer Script

In this example, we will:

  • Navigate across different blogs
  • Extract article titles, URLs, publish dates and tags.

Step 1: Create a folder for your project

mkdir realpython-scraper

Then navigate to the folder

cd realpython-scraper

Step 2: Initialize a New Node.js Project

This will hold your project’s dependencies:

npm init -y

After creating a package.json file, install Puppeteer in the folder by running:

npm install puppeteer

Step 3: Create the Scraping Script

  1. Create a new JavaScript file for scraping script:
touch realpython-scraper.js
  1. Open the file:
nano realpython-scraper.js
  1. Paste the following code, then save and exit (CTRL + O → Enter, CTRL + X):
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  await page.goto('https://realpython.com/', {
    waitUntil: 'domcontentloaded'
  });

  await page.waitForSelector('.card.border-0');

  const articles = await page.$$eval('.card.border-0', cards => {
    return cards
      .filter(card => card.querySelector('h2.card-title')) // filter only articles
      .slice(0, 5)
      .map(card => {
        const title = card.querySelector('h2.card-title')?.innerText.trim();
        const excerpt = card.querySelector('p.card-text')?.innerText.trim();
        const url = card.querySelector('a')?.href;
        return { title, excerpt, url };
      });
  });

  console.log('\n📰 Top 5 Articles on Real Python:\n');
  articles.forEach((a, i) => {
    console.log(`${i + 1}. ${a.title}`);
    console.log(`   ${a.url}`);
    console.log(`   ${a.excerpt}\n`);
  });

  await browser.close();
})();

The script will extract:

  • Article titles, URLs, publish dates and tags.

Step 3: Run the Script

node scrape.js

Expected output:

Troubleshooting

In the below image

  1. The Extracted Job Listings: [] means the script didn’t find any job listings on the page.
  2. No element found for selector: #text-input-what indicates the form input for the job search couldn’t be found.

How to fix the issue:

  • Job listings scraping issue: The selector used for extracting the job titles can be outdated or incorrect. You need to inspect the page and update the script with the correct selector.

Many social media platforms, job search engines like Indeed, and e-commerce sites like Amazon use anti-bot measures to prevent automated requests. For example, in the image below, Amazon serves the dog page, indicating that they utilize bot detection and block your request. Puppeteer (particularly in headless mode or with default settings) launches the target website.

Share This Article
MailLinkedinX
Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments