AIMultiple ResearchAIMultiple Research

AIMultiple Invoice Capture Benchmark Methodology in 2024

AIMultiple aims to help buyers identify the right invoice capture solution for their business.

AIMultiple’s first invoice capture benchmark will aim to help global Forbes 2000 businesses with multiple local subsidiaries in different countries that receive at least tens of thousands PDF or mail invoices. The benchmark will assess these aspects for their businesses:

  • Financial gains including efficiency gains as well as early payment discounts
  • Increased transparency
  • Total cost of ownership

What will be the guiding principles?

AIMultiple’s benchmark methodology is designed for an objective and transparent assessment. It also explains participation requirements.

What will be benchmarked?

Invoice capture technology. AIMultiple will share invoices in the form of digital files (e.g. PDF) via APIs provided by the vendors.

What is the benchmark dataset?

Invoices in the dataset need to be

  • representative of the invoices that a global corporation receives
  • numerous enough to ensure that the results are generalizable

Invoices will be representative from a country of origin perspective. Benchmark dataset will include at least 50 invoices per country. The list of countries is initiated with the 10 largest countries by exports as of 2022. Countries that standardized invoices are then excluded. Standardized documents follow the same format and have the invoice data embedded in a file attached to the invoice. Therefore, it is trivial to capture data from such documents and we exclude them since we don’t expect documents from these countries to be a differentiating factor in the benchmark. After these exclusions which take out China and Italy from the list, these countries remain out of the top 10:

  1. United States
  2. Germany
  3. United Kingdom
  4. France
  5. Netherlands
  6. Japan
  7. Singapore
  8. Republic of Korea

No legal entity will be represented more than once as an issuer of the invoice. Most companies follow a single format for their invoices. Therefore if an invoice from that company is correctly processed, it is likely that other invoices from the same company will also be processed correctly. However, AIMultiple will not monitor invoices from subsidiaries and therefore more than one invoices may be issued by the same controlling entity.

Invoices are sourced from:

  • AIMultiple
  • AIMultiple’s suppliers, customers or partners

All participating companies will have provided consent giving AIMultiple the right to distribute these invoices to vendors participating in the benchmark.

What is required from the vendor solution?

Invoice data needs to be delivered in a widely accepted format or its mapping to a widely accepted format should be provided so AIMultiple can automate the benchmark accuracy measurement.

For each field (i.e. gross amount) where the vendor makes a prediction , vendor needs to provide a recommendation to the human-in-the-loop. The allowed recommendations are:

  • Skip this field, it is auto processed correctly
  • Check this field, it may be incorrectly processed

Vendors may be providing this recommendation via a percentage, traffic lights etc. Vendor needs to provide AIMultiple the ruleset to convert the API output in this field to one of the two options above.

How will AIMultiple perform the benchmark?

AIMultiple’s invoice extraction benchmark aims to closely match the preferences of buyers. They want a solution that provides maximum financial benefit at the optimal price point. Therefore, AIMultiple will measure these metrics:

Manual effort

With today’s technology, it is highly unlikely that all documents can be completely automatically processed (i.e. straight-through processing). Therefore, buyers will want to identify the solution that requires the least manual intervention. AIMultiple will calculate the manual effort that the buyer’s team needs to spend in terms of human minutes to process the benchmark document. This will be calculated for each vendor separately and will be based on these assumptions:

  • No human review recommended: No time spent
  • Correct prediction and human review is recommended: This will require a visual validation from the human. This is likely to take a a few seconds, currently we are assuming 5 seconds per data field.
  • Wrong prediction & human review is recommended: This will require a visual validation and correction. This is likely to take longer than the option above. Currently, we are assuming that it takes 15 seconds per data field. In every process involving humans, errors are to be expected and we assume a 3% error rate.
  • Opening a document and checking a data field is likely to take longer than checking a data field on an open document. We assume that it will take 10 seconds to open a document.

Manual effort will be reported separately for critical data (e.g. amount to be paid, issuer, buyer etc.) in case the buyer wants to reduce the data collection effort. Finally, the number of documents that a human can process with these vendors will be extrapolated from the results.

The underlying data that lead to these calculations will also be reported in detail for customers that want to use their own assumptions in these calculations. For each vendor, these fields are to be reported: % correct, false and n/a at

  • each field level
  • aggregate values for critical fields and all fields
  • reported separately for cases where the vendor recommended a human review or not.

Early payment discounts

Certain invoices will require early payment discounts. If their number is deemed sufficient, these will be calculated for each vendor:

  • Potential savings from correctly identified early payment discounts as percentage of spending for these invoices
  • Share of invoices where early payment discounts were correctly identified

Costly mistakes

If one of the critical fields in a document is incorrectly estimated and the vendor’s product does not recommend a human review of that field, it would lead to an incorrect transaction. It is costly to roll-back incorrect transactions and the number of incorrect transactions caused by each vendor will be reported.

Other metrics

Non-critical metrics will be provided for information:

  • Average document processing time by vendor
  • Distribution of document processing time

A maximum of 5 seconds will be allowed per page which needs to cover data processing and transfer time.


Public cost data published by the vendors will be used to calculate the cost of the benchmark. Vendor’s cost model will also be shared to help buyers compare prices of different vendors.

Customer service

Reviews on B2B review platforms will be analyzed to assess customer satisfaction.

How will the results be published?

They will be published on and will feature graphs that users can leverage to find the right vendor for their business. Different metrics (e.g. manual effort) will be separately presented to create transparency for buyers.

Each participant will receive their detailed data field level results as well as the average results.

Open questions

Enterprises may be more interested in the long term benefits of these solutions and therefore may want to see how they perform after training. A sample from the dataset could first be used with a human-in-the-loop to train the solutions. However, this would add significantly more effort to the benchmarking exercise and it is an open question.

Please note that AIMultiple is in the design phase of the benchmark and changes will be made as AIMultiple gets end user feedback and finalizes the benchmark.

Reach out to AIMultiple team via if you would like to participate in the AIMultiple invoice capture benchmark.

Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.
Cem Dilmegani
Principal Analyst
Follow on

Cem Dilmegani
Principal Analyst

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read


Your email address will not be published. All fields are required.