AIMultiple ResearchAIMultiple Research

Invoice Capture: Benefits, Use Cases, & Top Vendors in 2024

Written by
Cem Dilmegani
Cem Dilmegani
Cem Dilmegani

Cem is the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per Similarweb) including 60% of Fortune 500 every month.

Cem's work focuses on how enterprises can leverage new technologies in AI, automation, cybersecurity(including network security, application security), data collection including web data collection and process intelligence.

View Full Profile
Invoice Capture: Benefits, Use Cases, & Top Vendors in 2024Invoice Capture: Benefits, Use Cases, & Top Vendors in 2024

AIMultiple team adheres to the ethical standards summarized in our research commitments.

Invoice capture is a growing area of AI where most companies are making their first purchase of an AI product. This is because invoice capture is an easy to integrate solution with significant benefits.

While digitization helped automate numerous processes, mostly rule based software was used in digitization. Invoice capture software is different. Invoice capture involves both reading the invoice text with optical character recognition (OCR) and understanding its context with machine learning with NLP.

In this article we will go in-depth into the concept of invoice capture, the inner-workings of invoice capture technology, showcase the top vendors, and more.

What is invoice capture?

Invoice capture (also called invoice data extraction or invoice OCR) is extracting structured data from invoices so invoices can be automatically processed. Invoice capture has been the first back office process to be automated with AI for most companies.

This is only relevant for invoices that are received outside of an Electronic Data Interchange (EDI). Invoices that arrive via EDI can be auto-captured since they are already in the form of structured XML files. For more on different types of invoices, feel free to read our article on invoices.

If there is significant uncertainty about the data, a human is notified to take a look at the invoice. If data extraction is deemed to be successful, data is fed to the record keeping and payment systems.

Companies need to set up quality assurance processes in any automated process where errors can be costly. Invoice capture is no exception. To ensure that wrong payments are not made, suspicious invoices and invoices that require payments beyond a certain limit would need to be reviewed by humans.

How do machine learning powered invoice capture solutions works?

End-to-end automation of processes is possible thanks to hyperautomation which is the technology that combines different technologies such as AI, OCR and RPA. Invoice automation is also an example of hyperautomation. We have explained how hyperautomation works for processes triggered by incoming documents or email before.

What are the benefits of invoice capture?

  • Reduces back-office costs by reducing manual effort
  • Allows employees to focus on higher value added activities
  • Reduces invoice processing errors
  • Allows faster turn-around time, which prevents back and forth between suppliers and employees
  • Allows auditability by storing invoice data with bounding visual boxes that show where data was extracted from the invoice.
    • In case the company discovers that data extraction had faults, these documents can be used to understand the source of errors which can be corrected for future invoices.
  • Enables companies to run compliance checks on invoice data by capturing all data that could go unnoticed if done manually.

What are the differences between invoice capture and OCR?

While OCR captures text, invoice capture solutions capture key-value pairs and tables which are required to auto process invoices.

Capturing key value pairs

Invoices include key value pairs such as company name, bank account number etc. Invoice capture solutions can extract key value pairs from documents.

Image shows how invoice capture software automatically extracts key value data from different documents.
Source: Amazon AWS Textract

Capturing tables

Most invoices include an itemized list of services or products provided. Invoice capture solutions can recognize these itemized lists and process them.

Image shows how invoice capture software extracts tabular data.
Source: Amazon AWS Textract

What are different types of invoice capture solutions?

In the invoice automation landscape, there are 3 types of solutions:

  1. Template based solutions: End-user inputs the document structure to the software. These solutions were prevalent before the recent rise of machine learning solutions. However, they are no longer relevant since
    • There are many different structures for invoices and these structures tend to change over time. This results in errors.
    • Using templates creates a code base that needs to be maintained
    • Inputting template structures to the software is additional work. Ideally, automation solutions should not create new manual tasks for users.
  2. Pre-trained machine learning (ML) solutions: Companies build automation solutions based on millions of invoices. These solutions are great however can run into issues when they face types of invoices they have not encountered before.
  3. Continuously trained ML solutions: Best solutions on the market. They are trained on millions of invoices and developers work with their customers to ensure that their solution is constantly trained on new invoices.

What type of companies provide invoice capture solutions?

Established account payable tech companies

These companies were to first to provide invoice data extraction solutions. Since their solutions were the first solutions on the market, some solutions are dated and rely on templates.

Tech giants

Amazon AWS Textract is a new comer in the field. Amazon also brings the ability to combine Textract with other services like ground truth. For example, ground truth could provide human validators to check documents that Textract can not process with a high level of confidence. This combination of services could allow companies to completely outsource their document processing. Such combined services can also be built on top of other companies’ solutions as well since most invoice capture solutions support APIs.

For some of the solutions analyzed, check:

Other solutions are:


Startups leverage machine learning to build flexible solutions. Since the increasing commercialization of AI in the last 10 years, there has been an increase in application of AI into extracting structured data from semi-structured data. Outsiders could see startups as doomed after Amazon’s entry to the business. However, startups still have major advantages when compared to Amazon:

What is the complete list of companies that provide invoice capture solutions?

Below, you can find our initial list on the topic. Later we expanded our research and we are now keeping a regularly updated list of invoice capture vendors.

CompanyNumber of employees on linkedinArea of focusPricingLargest customersOn prem solutionType of solution
Amazon AWS TextractN/ADocument data extraction$0.05 per page**RochePossible with AWS Outposts***Pre-trained ML
Coupa1000+B2B spend managementTemplate based
Datamolino11-50Bookkeeping automationNot template based
Docparser1-5Document data extraction$0.05 per document (up to 5 pages per document)SMEsN/ATemplate based
Docucharm1-5Document data extractionN/AContinuously trained ML
Hypatos11-50Document data extraction & advanced processingCommunity Edition is freePwC
Schwarz Gruppe
AvailableContinuously trained ML
Instabase11-50Document data extraction
pdfdata.io1-5Document data extractionTemplate based
Proactis501-1000B2B spend managementNumerous Fortune 500Available
SapphireOne1-5ERP, CRM, DMS and Business Accounting SoftwareTemplate based
Tabula (open source)Not applicableTable extractionTemplate based
Tipalti100-500B2B spend management
Xtracta11-50Document data extractionAvailable
* According to case studies
** Including key value pair+table extraction at a volume of 1M+ pages/month
*** Outposts was announced in AWS re:Invent 2018 but is not yet available. Post launch, services like RDS, ECS, EKS, SageMaker, EMR are announced to be the first services to be available

How to choose your invoice capture vendor?

1. Choose a provider that supplies a solution in line with your company’s data privacy policies.

Your company’s data privacy policy can be a show-stopper to using external APIs such as Amazon AWS Textract. Most providers offer on-premise solutions so data privacy policies would not necessarily stop your company from using an invoice capture solution.

2. Choose a provider that can provide a consistent data structure regardless of the text on the documents.

There are two ways that deep learning based invoice capture companies work. Companies like Textract return key value pairs. So for example, if an invoice calls the total amount as “Gross amount”, the other calls it “Total amount” and another German invoice calls it “Summe”, Textract gives you the data in 3 different structures for these 3 documents. In one, you have a key value pair with the key “Gross amount”, in another “Total amount” and in the German one, you get “Summe”. Other providers designed consistent data structures that work for all invoices. In all 3 scenarios, you would get “Total amount” which the key they use in their output file. This makes analytics and processing easier as you don’t need to deal with many different structured data formats.

3. Ask for the false positive and manual data extraction rates

Then run a Proof of Concept (PoC) project to see the actual rates on the invoices received by your company.

  • False positives are invoices that are auto-processed but have errors in data extraction. These are difficult to identify and can disrupt operations. For example, incorrect extraction of payment amounts would be problematic. Minimizing this should be the absolute focus.
  • Manual data extraction is necessary when automated data extraction system has limited confidence in its result. This could be due to a different invoice format, poor image quality or a misprint by the supplier. This is also important to minimize but there’s a trade-off between false positives and manual data extraction. Having more manual data extraction can be preferable to having false positives.

This is the first quantitative benchmarking we have seen in this space and will follow a similar methodology to prepare our own benchmarking.

4. Leverage a PoC to measure the potential automation rate

This depends on the number of fields you expect to capture from the documents. A typical set of ~10 fields including items like purchase order ID, vendor name, vendor name etc. can enable data entry into ERP and payments. Best practice vendors achieve ~80% STP by extracting all of these ~10 fields with almost no errors ~80% of the time. Though there may be errors from time to time, manually checking the largest payments can ensure that no significant wrong payment slips through the net.

5. Ask for advanced processing options provided by the vendor 

Extraction is the first step in data collection, it needs to be followed by data processing in most cases. For example, invoices need to be checked for VAT compliance (e.g. domestic invoices without VAT need to explain why VAT is excluded) and failure to do so could result in significant fines for the company depending on the country.

6. Ask for how the solution learns about new invoices

Best solutions have an interface for allowing your team to help guide the solution. As your company’s employee picks the key-value pairs, the invoice capture solution takes note so it can be more confident about a similar invoice next time.

7. Evaluate the ease-of-use of their manual data entry solution

It will be used by your company’s back-office personnel as they manually process invoices that can not be automatically processed with confidence.

Beyond this, best practice procurement questions make sense. For example:

  • How widely adopted is their solution? Do they have Fortune 500 customers?
  • Are their customers happy with their solution and support? Could be good to ask an acquaintance from a company that is already using their solution. Since invoice automation is not a solution that would improve marketing or sales of a company, even competitors could share with one another their view of invoice automation solutions.
  • What are the options to integrate the solution to your company’s systems (e.g. ERP)? Is IT on-board with the integration approach?
  • What is their Total Cost of Ownership (TCO)? Different solutions use different units of pricing (e.g. price per page or price per document) which makes this comparison difficult. However, using a sample from your archives, you could have an estimate of the cost.

If you have more questions, feel free to contact us of course:

Find the Right Vendors

Featured image source and other image sources

Cem Dilmegani
Principal Analyst

Cem is the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per Similarweb) including 60% of Fortune 500 every month.

Cem's work focuses on how enterprises can leverage new technologies in AI, automation, cybersecurity(including network security, application security), data collection including web data collection and process intelligence.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Cem's hands-on enterprise software experience contributes to the insights that he generates. He oversees AIMultiple benchmarks in dynamic application security testing (DAST), data loss prevention (DLP), email marketing and web data collection. Other AIMultiple industry analysts and tech team support Cem in designing, running and evaluating benchmarks.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Sources: Traffic Analytics, Ranking & Audience, Similarweb.
Why Microsoft, IBM, and Google Are Ramping up Efforts on AI Ethics, Business Insider.
Microsoft invests $1 billion in OpenAI to pursue artificial intelligence that’s smarter than we are, Washington Post.
Data management barriers to AI success, Deloitte.
Empowering AI Leadership: AI C-Suite Toolkit, World Economic Forum.
Science, Research and Innovation Performance of the EU, European Commission.
Public-sector digitization: The trillion-dollar challenge, McKinsey & Company.
Hypatos gets $11.8M for a deep learning approach to document processing, TechCrunch.
We got an exclusive look at the pitch deck AI startup Hypatos used to raise $11 million, Business Insider.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read


Your email address will not be published. All fields are required.

Janice Toh
Jun 22, 2021 at 09:25

Hello Cem,

This is a very good list and is referred to in my own presentations.
At Contactous, we have been serving customers in this domain for 5 years now.
For data extraction and parsing, we have focused on on-prem solutions.

May I request you to add our name to this list please.
I have filled the details for the table:

Company: Contactous
Number of Employees on LinkedIn: 1-5
Area of Focus: Data Quality, Extraction and Parsing
Largest Customers: Mead-Johnson, KPMG
On-Prem Solution: Available
Type of Solution: Template based

Thank you.


Cem Dilmegani
Jun 26, 2021 at 12:38

Hi Janice, thanks for your comment. Why don’t you sign up at so we include your company in relevant lists?

Related research