We follow ethical norms & our process for objectivity.

This research is not funded by any sponsors.

2025 OCR benchmark results

Previous OCR Benchmark results

FAQ

2025 OCR benchmark results Previous OCR Benchmark results FAQ

Table of contents

2025 OCR benchmark results Previous OCR Benchmark results FAQ

Document Automation

Updated on Apr 29, 2025

OCR Benchmark: Text Extraction / Capture Accuracy [2025]

Cem Dilmegani

with Şevval Alper

See our ethical norms

OCR accuracy is critical for many document processing tasks and SOTA multi-modal LLMs are now offering an alternative to OCR. We tested leading OCR services to identify their accuracy levels in different document types:

Printed text:
- All solutions achieve >95% accuracy.
- Recommended solution: A free solution like Tesseract.
Printed media:
- Accuracy range: ~60% to ~90%
- Recommendation: GCP, AWS or Azure OCR services or multi-modal LLMs like GPT-4o or Claude Sonnet 3.7.
Handwriting:
- Accuracy range: ~20% to ~96%.
- Recommendation: Same as printed media but multi-modal LLMs have a bit higher accuracy.

2025 OCR benchmark results

Product names were shortened above, their full names are listed below. We used their latest versions as of March/2025:

Claude Sonnet 3.7
OpenAI GPT-4o
Amazon Textract API
Google Cloud Vision API
Microsoft Azure Document Intelligence API
Mistral OCR
Pytesseract

We calculated the accuracy of results as a percentage for printed text, printed media, and handwriting. For the overall results, we added all the 3 results together, so the overall results are calculated over 3 categories.

We were surprised by Mistral OCR’s poor performance given its high scores on open source benchmarks. When examining failure cases, we found it struggled with cursive handwriting and disorganized text layouts (mixed line ordering, inconsistent capitalization) in the handwriting category. Similar issues occurred with printed media, particularly in low-resolution images and those containing multiple font styles.

Dataset

We used 30 documents of 3 categories in this benchmark:

Printed text includes letters, website screenshots, emails, reports, etc.

Printed media includes posters, book covers, advertisements, etc. We aimed to see the success of the OCR tools in different text fonts and placements.

Files in these 2 categories were sourced from the Industry Documents Library (IDL).¹

Handwriting: In the handwritten category, some IDL documents were not easy to read so our team generated documents that are similar to the IDL documents. We manually prepared samples of human-legible handwriting. Half of the samples were cursive and the other half were print style (i.e. separate letters were not connected).

Methodology

This benchmark focuses on the text extraction accuracy of the products.

Preprocessing: This is only done for the handwriting category. We took pictures of handwritten documents with our smartphones and using a mobile scanner app, CamScanner:

Pictures were converted to black-and-white
The contrast was increased and the background was removed.

OCR: We ran all the products on the same data set and generated text outputs as raw text (i.e. .txt) files. Then, we manually prepared the ground truth including the correct text in all of these files. The ground truth was verified twice by humans.

Comparison: We compared the outputs from OCR solutions with the original texts to measure their accuracy. We used the similarity function from the spaCy library in Python and calculated the similarity score between each product’s output and the original text. The similarity score obtained in this operation is the text accuracy.

The similarity function uses cosine distance formulation for calculating the similarity between two texts. We did not use Levenshtein distance for this benchmark because different products output texts in different orders.

While Levenshtein distance takes these differences into account, we are only looking for how accurately the text is detected but not where it is located. The cosine distance has negligible penalties for such cases, so we decided to use it in this benchmark.

Product selection

There are many OCR products in the market. We need to focus on the ones that can output raw text results. The products for this benchmark are chosen based on:

Capability to extract text. We did not include solutions that only extract machine-readable (i.e. structured data) in this comparison
Their popularity in the market

This is not a comprehensive market review and we may have excluded some products with significant capabilities. If that is the case, please leave a comment and we will be happy to expand the benchmark.

Limitations

Advanced capabilities like text location detection, key-value pairing, or document classification were not evaluated in this benchmark.

The sample size will be increased in the next iteration. If you are looking for OCR for handwriting, see our handwriting OCR benchmark with 50 samples.

You can also see our invoice OCR benchmark and receipt OCR benchmark if you are interested.

Previous OCR Benchmark results

OCR accuracy benchmark of top OCR companies — Overall Results of OCR Text Accuracy with 90% confidence intervals

Google Cloud Vision and AWS Textract are the leading technologies in the market for all cases
Abbyy also has high performance for non-handwritten documents
All benchmarked OCRs, including the open source Tesseract performed well on digital screenshots.

Google Cloud Platform’s Vision OCR tool has the greatest text accuracy by 98.0% when the whole data set is tested. While all products perform above 99.2% with Category 1, where typed texts are included, the handwritten images in Category 2 and 3 create the real difference between the products.

The overall results show that GCP Vision and AWS Textract are the dominant OCR products that most accurately recognize the given text.

Notes from the overall results:

There is a single time when AWS Textract failed to recognize the handwritten text. This situation significantly reduces AWS Textract’s category and total performance. It also increases the deviation for the category and in total because AWS Textract performs very successfully in all other instances.
Azure is the leading product in Category 1 with 99.8% accuracy. However, the product usually fails to recognize the handwritten text, as seen in the results for the second category. This is the reason why Azure falls behind in the third category and total.
Tesseract OCR is an open-source product that can be used for free. Compared to Azure and ABBYY, it performs better in handwritten instances and can be considered for handwriting recognition if the user cannot obtain AWS or GCP products. However, it may perform poorer in scanned images.
Unlike other products, ABBYY outputs a more structured .txt file. ABBYY also considers the location of the text on the image while generating the output file. While the product has further useful capabilities, we are focused only on the text accuracy in this benchmark. And it performed poorly in handwriting recognition.

Removing the “Trouble-Maker” image

As mentioned in the overall results, there was a single “outlier” image where AWS Textract could not recognize any text. While the product shows more than 95% text accuracy in all other images, this instance reduced AWS’ performance and widened its confidence interval.

As this instance might be an exception, we also wanted to compare the products without it. We called this image the “troublemaker” and re-run our results to see if they made a difference.
Here are the new results when the “trouble-maker” is excluded from the data set.

OCR accuracy benchmark of top OCR companies after removing one outlier image — OCR Text Accuracy Results when the “trouble-maker” is excluded. 90% confidence interval is shown

When the “trouble-maker” is excluded, AWS Textract becomes the top performer by an almost perfect (99.3%) text accuracy level with a narrow confidence interval. While the scores do not change much, GCP Vision and AWS Textract are still the top 2 products that perform better than the others in terms of text accuracy.

Results without Handwriting Recognition

The main factor that reduces the text accuracy of certain products is the images that include handwriting. Thus, we excluded all images (all of category 2 and 6 images from category 3) and re-evaluated the text accuracy performance, again.

OCR accuracy benchmark of top OCR companies after removing handwritten text — OCR Text Accuracy without handwriting recognition cases

The results are more head-to-head when handwritten images are excluded. AWS Textract and GCP Vision remain the top 2 products in the benchmark, but ABBYY FineReader also performs very well (99.3%) this time. Although all products perform above 95% accuracy when handwriting is excluded, Azure Computer Vision and Tesseract OCR still have issues with scanned documents, which puts them behind in this comparison.

Benchmarked products

We tested five OCR products to measure their text accuracy performance. We used versions available as of May/2021. Used products are:

ABBYY FineReader 15
Amazon Textract
Google Cloud Platform Vision API
Microsoft Azure Computer Vision API
Tesseract OCR Engine

Dataset

Although there are many image datasets for OCR, these are

mostly at character level and do not conform to real business use cases
or focus on the text location rather than the text itself.

Thus, we decided to create our own dataset under three main categories:

Category 1 – Web page screenshots that include texts: This category includes screenshots from random Wikipedia pages and Google search results with random queries.
Category 2 – Handwriting: This category includes random photos that include different handwriting styles.
Category 3 – Receipts, invoices, and scanned contracts: This category includes a random collection of receipts, handwritten invoices, and scanned insurance contracts collected from the internet.

All input files are in .jpg or .png format.

Limitations

Limited Dataset: Originally, we had a fourth category that consisted of photos of newspapers to observe the performance of products in the photos of printed documents. However, these photos include too much text which made it hard to generate the ground truth. Thus, we decided not to use them.
Inconsistencies in output formats: Many images include instances where there are separate texts on the left and right-hand sides. The products extract these texts in different orders causing the output files to be different although texts are accurately detected. This situation prevented us from using other distance measures (like Levenshtein distance) and limited our options for calculating text accuracy.
Possible Problem with Cosine Distance: The cosine distance uses embeddings while calculating the similarity. For example, comparing the sentences “I like tea” and “I like coffee” would give a higher similarity score than it should be. However, cases like detecting the word “tea” as “coffee” would hardly ever occur, so we did not take this possibility into account in this exercise.

We use other market data (e.g. software reviews, customer case studies) to rank software providers. Feel free to check out our list of OCR providers. However, since most corporates use the term “OCR” when searching for data extraction solutions (i.e. including those that generate machine-readable data), our list has a larger scope and more companies than those presented in this benchmarking exercise.

FAQ

What is OCR?

Optical Character Recognition (OCR) is a field of machine learning that specializes in distinguishing characters within images like scanned documents, printed books, or photos. Although it is a mature technology, there are still no OCR products that can recognize all kinds of text with 100% accuracy. Among the products that we benchmarked, only a few products could output successful results from our test set.
OCR tools are used by companies to identify texts and their positions in images, classify business documents according to subjects, or conduct key-value pairing within documents. Based on OCR results, other technology companies build applications like document automation. For all these business cases, accurate text recognition is critical for an OCR product.

External Links

1. pixparse/idl-wds · Datasets at Hugging Face. Pixel Parsing

Share This Article

Cem Dilmegani

Follow on

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Follow on

Researched by

Şevval Alper

Industry Analyst

Şevval is an AIMultiple industry analyst specializing in AI coding tools, AI agents and quantum technologies.

Next to Read

Invoice OCR Benchmark: Extraction Accuracy of LLMs vs OCRs

Jun 237 min read

Receipt OCR Benchmark with LLMs in 2025

Apr 73 min read

Handwriting Recognition Benchmark: LLMs vs OCRs in 2025

Jan 235 min read

Comments

Your email address will not be published. All fields are required.

8 Comments

Serhat Cinar

Feb 28, 2025 at 09:34

Did you ever think of oncluding multimodal llms in your comparison, like gpt4o, llama 3.2. gemini, claude etc.?

Cem Dilmegani

Mar 17, 2025 at 02:59

Hi Serhat and thank you for your comment,
Yes, we added those for which we have API access like Claude and GPT-4o.

DLJ

Oct 17, 2024 at 11:14

Just stumbled on this milestone assessment update. Could you kindly elaborate further on the three revised datasets: Thanks for this work.

Character Sets
When someone refers to ‘handriting’, that can mean many things: ‘handwriting style’ typefaces (per Docusign, etc.), and hand-printed (block printing and mixed-case printing) as often found in combs and box delineators, and finally, cursive or longhand writing (exclusive of signatures).

Character Context
Structured content, semi-structured content, and unstructured content.

Image Qualities (bitonal, greyscale, full colour, spatial dpi, from a scanner/cell-phone/native rendering, image ‘enhancements’ prior to OCR (thickening, local gamma, background dropout, sharpening, smoothing, noise removal, etc.)

These can have significant impacts, and some don’t realize the importance of including these benchmark differentiators.

Cem Dilmegani

Oct 22, 2024 at 03:15

Hi there, thank you for the detailed comment, we are updating the article to include these details.

Webster

Feb 05, 2023 at 07:24

Hello, great work! Just curious, did you use a trained Tesseract when making these testing?

Bardia Eshghi

Feb 06, 2023 at 12:29

Hi, Webster. Glad you enjoyed the article. The tools we tested were:
ABBYY FineReader 15
Amazon Textract
Google Cloud Platform Vision API
Microsoft Azure Computer Vision API
Tesseract OCR Engine
Hope this answers your question.

Bobby

Aug 14, 2022 at 23:54

The graph images are not working for me at the moment. Otherwise great

Cem Dilmegani

Aug 15, 2022 at 14:48

Thank you Bobby! We have a glitch in the CMS and we are fixing it. Apologies for the issue, it should be fixed next week.

samsun

Jun 07, 2022 at 14:10

Thanks for sharing, can you add a free OCR for everyone to use?
https://www.geekersoft.com/ocr-online.html

Cem Dilmegani

Aug 17, 2022 at 07:46

Hi Samsun, unfortunately, we don’t share all OCR providers on this page, there are thousands of them. We tried to put together the largest ones in terms of market presence. If you have evidence that your solution is one of the top 10 globally, please share it with us at info@aimultiple.com so we can consider it.

Scott

Jan 20, 2022 at 20:42

What version of Tesseract did you test with? They recently released v5.

Cem Dilmegani

Aug 23, 2022 at 12:01

Hi Scott, we did the benchmarking before Tesseract 5. We will redo it soon and include the versions in the methodology section as well.

Bob

Jan 12, 2022 at 15:09

This is very informative, nice work. I assume your tests used documents/images in English? I’ve been experimenting with OCR tools on other languages and finding relatively poor accuracy.

Cem Dilmegani

Jan 15, 2022 at 13:52

Exactly, all text were in English.
I hear similar things about OCR on non-Latin characters. We have an Arabic speaker in the team who claims that accuracy in Arabic is much lower compared to English.
We can do a benchmark on non-Latin characters if there is demand for it.

kin

Jun 21, 2021 at 02:22

interesting post!!!
do you have any suggestion about improving accuracy on scanned image ? i’m using tesseract right now. anyway , great work!

Cem Dilmegani

Jun 22, 2021 at 07:50

Thank you for the comment. There are pre-processing approaches that can be implemented to improve image quality. But such approaches may already be used in Tesseract. A detailed research into Tesseract image processing would be helpful in your case.

Related research

Agentic Document Extraction: LandingAI & more in 2025

Jul 248 min read

14 Rossum AI Alternatives / Competitors in 2025

Jul 245 min read