AIMultiple ResearchAIMultiple Research

New: In-Depth Diffbot Review: Key Capabilities & Pricing in 2024

Burak Ceylan
Updated on Jan 4
4 min read

Each interaction online, from browsing to query searches, creates a wealth of valuable data, becoming an invaluable asset for businesses. It is vital to have an efficient web scraping tool to extract this information. Selecting the appropriate tool for your data retrieval requirements is a must. In this competitive domain, Diffbot emerges as a powerful contender, providing an array of products and web scraping solutions suitable for both organizations and individual users.

In this article, we will thoroughly examine Diffbot, uncovering its functionalities and advantages within the dynamic realm of web scraping.

Diffbot overview

As a pioneer in machine learning and computer vision techniques,  Diffbot offers public APIs that extract data from web pages , subsequently forming a knowledge base. Diffbot harnesses AI to transform the expansive web into comprehensible knowledge graphs.

What does the company offer?

At its core, Diffbot offers algorithms that can crawl the web, extracting valuable data from web sources such as articles, discussions, and other types of pages. These algorithms can then organize and convert this data into structured formats.

Features & key capabilities

Diffbot’s platform offers a range of features designed to enhance the way organizations access and utilize online data:

  • Knowledge graphs: One of Diffbot’s standout capabilities is its ability to create knowledge graphs. The graphs are built by using advanced web scraping techniques to extract structured data from millions of web pages, such as articles, product listings, and profiles. Once extracted, the data is organized into entities and relationships. For instance, an entity could be a company, and relationships could define its founders, products, and related news articles (Figure 1). 
Figure 1. Source: Diffbot 1

Graphs also provide a semantic understanding of the data, meaning it understands the context and connections between different pieces of information. As new information emerges and as the web grows, Diffbot’s algorithms continuously crawl and update the knowledge graph. Developers and businesses can query the knowledge graph using Diffbot’s APIs.

  • Crawlbot: A tool that automates large-scale web crawls. Users can set it up to crawl entire websites and extract data using the automatic or custom APIs. 
  • Diverse data extraction: Beyond just text, Diffbot can extract a variety of data types from the web, including videos, images, and even nuanced discussions across different industries.

Case study: 

Industry: RelationalAI is an AI startup blending databases, knowledge graphs, and artificial intelligence, developed an advanced relational reasoning engine to efficiently handle intricate, linked data.

Challenge:They collaborated with a major online retailer to enhance their product recommendations. Later, they encountered a challenge with the retailer’s insufficient product data. The initial dataset comprised 206 washers with 266 attributes and 34k facts. 

Solution used: To address this data gap, RelationalAI leveraged Diffbot’s Product extraction technology and Knowledge Graph. With only limited information about the 206 washers, Diffbot’s software scoured the web, extracting detailed specs like brand, price, and capacity. See Figure 2:

Figure 2. Source: Diffbot 2

Diffbot pricing

Diffbot offers different payment options for companies of different sizes.3 These options can be differentiated in terms of product access, usage & benefits and support. Please see the table below for more information:

PlanStarting Price/moProduct AccessUsage & FeaturesSupport
Plus$299-Extract
-25 crawls
-Knowledge graph research
-API access
-1M credits
-Dashboard access
-Email
Startup$899-Extract
-Datacenter proxies
-Third party proxies
-Knowledge graph research
-API access
-250k credits
-Dashboard access
-Email
EnterpriseCustom
-Extract -Third party proxies -100+ crawls -Knowledge graph research -Third party proxies
-API access
-Custom credit
-Dashboard access
-Email
-Custom SLA

Free Trial: Diffbot offers free trial option that includes:

  • 10.000 credits for 2 weeks
  • Knowledge graphs
  • Access to dashboard
  • Developer APIs

Diffbot performance evaluation 

Pros:

  • Efficiency: Not being affected by website design changes much, Diffbot crawlers were found stable compared to other web scraping tools that are based on the visual layout and HTML of a webpage.4
  • Usability: Diffbot provides access to a large access poll of companies and contact information. Plus, the software continuously improves their user interface (Figure 3) 5
Figure 3. Source: G2

Cons:

  • Learning curve:   Using Diffbot NLP API and Extract API can be difficult for those unfamiliar with it to understand how to use it properly. Extract API relies on computer vision technology to interpret and scrape data. This can be more complex than rule-based data extraction methods. 6
  • Proxy usage: Diffbot offers two levels of proxies: default and dynamic proxy solutions. When you make a request using default proxy servers, the vendor charges users based on the numbers of API calls. You would incur additional costs when leveraging these default proxies. Each target web page processed using a proxy server counts as two API calls, your costs would double when using proxies.  
  • Extract API: Apart from pricing packages for businesses, Diffbot charges customers based on entities (Figure 4), which was disliked by reviewers. 7
Figure 4. Source: Diffbot 8

Further reading

If you need help finding a vendor or have any questions, feel free to contact us:

Find the Right Vendors
Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.
Cem Dilmegani
Principal Analyst
Follow on

Burak Ceylan
Burak is an Industry Analyst in AIMultiple. He received his Masters' degree in Political Science from Middle East Technical University. He has background in researching location-based platforms.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments