AIMultiple ResearchAIMultiple Research

Synthetic Data Statistics: Benefits, Vendors, Market Size [2024]

Synthetic data generation tools generate synthetic data to preserve the privacy of data, to test systems or to create training data for machine learning algorithms. For more detailed information about synthetic data, please check our ultimate guide to synthetic data.

This article is a collection of 26 up-to-date synthetic data statistics from various origins such as researches of reputable sources. In this list, you will find synthetic data stats about:

Market size forecasts

We can expect that the synthetic data market will continue to grow >10% p.a. since most of the synthetic data market serves:

  1. Test data management which is expected to grow 11.6% CAGR
  2. AI training data generation, which is expected to grow at 22.2% CAGR

Why does synthetic data matter?

  • Gartner estimates that by 2024, 60% of the data used for the de­vel­op­ment of AI and an­a­lyt­ics projects will be syn­thet­i­cally gen­er­ated.
  • Mostly AI claims that synthetic data can retain 99% of the information and value of the original dataset while protecting sensitive data from re-identification. However, these results are based on a benchmark analyzed by their own team and the underlying data was not published. (Mostly AI)
  • “Companies only need 50% of their original, authentic training data to finish the formal training of their algorithms”, claims Yashar Behzadi, CEO of Neuromation, a synthetic data generation startup. (eeNews)
  • When training data is highly imbalanced (e.g. more than 99% of instances belong to one class) synthetic data generation is necessary to build accurate machine learning models. (Tensorflow)
  • Another important function of synthetic data is to keep data secure as:
    • 17% of the global online population were victims of digital theft over the last few decades and it is estimated that 80% of the cybercrimes are not reported. (UN)

How protective is synthetic data?

Synthetic data is useful to eliminate security gaps that traditional anonymization techniques can not prevent. Therefore, using an AI-powered synthetic data generation tool can be beneficial to better protect sensitive data. Vendor claims include:

  • 80% of credit card owners can be re-identified from 3 transactions when traditional anonymization techniques are used. (Mostly AI)
  • 51% of mobile phone owners can be re-identified by 2 antenna signals when traditional anonymization techniques are used. (Mostly AI)
  • 87% of all people can be re-identified by their birthday, gender and postcode when traditional anonymization techniques are used. (Mostly AI)

Synthetic data benefits

There are numerous case studies demonstrating that synthetic data improve machine learning model accuracy.

  • Microsoft generated 2 million synthetic sentences to improve the translation of Levantine dialect of Arabic. (Microsoft)
  • A 2020 study shows that using synthetic data improved the machine learning model performance up to 20% while categorizing actions in videos. (American University of Beirut)
  • Researchers were able to identify drivers of cars with 87% accuracy by analyzing synthesized sensor data generated by vehicles. (De Gruyter)
  • A study depicts that using synthetic data reduces the false-positive rates from 60% to 20% while predicting volcanic eruptions. (ScienceMag)

Top synthetic data vendor funding stats

  • TwentyBN raised $12.5M (2 rounds)
  • Hazy raised $6.8M (5 rounds)
  • Mostly AI raised $31.1M (3 rounds)
  • AI.Reverie raised $5.8M (4 rounds)
  • DataGen Technologies raised $72M (3 round)

Top synthetic data vendors by number of employees

  • TwentyBN has 11-50 employees
  • Hazy has 11-50 employees
  • Mostly AI has 11-50 employees
  • AI.Reverie has 1-10 employees
  • DataGen Technologies has 11-50 employees

For more detailed information about synthetic data generator vendors, please check our synthetic data vendor selection guide or contact us:

Find the Right Vendors

Sources: Mostly AI*, Gartner, eeNewsTensorflowUN, Mostly AI**Mostly AI***Mostly AI****,  MicrosoftAmerican University of BeirutDe Gruyter,  ScienceMag, Funding and number of employees data is from Crunchbase

This article was originally written by former AIMultiple industry analyst Izgi Arda Ozsubasi and reviewed by Cem Dilmegani

Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.
Cem Dilmegani
Principal Analyst
Follow on
Cem Dilmegani
Principal Analyst

Cem is the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per Similarweb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Sources:

AIMultiple.com Traffic Analytics, Ranking & Audience, Similarweb.
Why Microsoft, IBM, and Google Are Ramping up Efforts on AI Ethics, Business Insider.
Microsoft invests $1 billion in OpenAI to pursue artificial intelligence that’s smarter than we are, Washington Post.
Data management barriers to AI success, Deloitte.
Empowering AI Leadership: AI C-Suite Toolkit, World Economic Forum.
Science, Research and Innovation Performance of the EU, European Commission.
Public-sector digitization: The trillion-dollar challenge, McKinsey & Company.
Hypatos gets $11.8M for a deep learning approach to document processing, TechCrunch.
We got an exclusive look at the pitch deck AI startup Hypatos used to raise $11 million, Business Insider.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments