Examples of Optical Character Recognition Tools

Examples of Optical Character Recognition Tools

Oussama El Ghoul

Research article Online Open access | Available online on: 25 November, 2020 | Last update: 27 October, 2021

View PDF Nafath

Volume 6

Issue 15

Optical Character Recognition (OCR) refers to computer processes for converting images of printed, typed, or handwritten texts into text files. A computer requires OCR software to perform this task. This allows retrieving the text in the image and to save it in a file that can be used in a word processor for enrichment and stored in a database or on another medium that can be used by a computer system. Today, there many OCR engines that are used including Google Drive OCR, Tesseract, Transym, OmniPage, etc. Many of them are paid, however, some are accessible for free.

Recognizing Arabic text is a popular research topic, a significant amount of research efforts is being invested to increase the accuracy rate of Arabic OCR by using different approaches and technologies. In 2002, a system to recognize Arabic text using a neural network was developed using a set of moment invariants descriptors. An artificial neural network (ANN) is used for classification [1]. The study has shown a high accuracy rate of 90% [2]. Another research project made in 2017 used a database of 34,000 characters, 70% are used for training the machine learning, 15% for the testing phase, and 15% for validation. The project has achieved a 98.27% recognition rate [3]. In 2018, a project aiming to recognize Arabic handwriting used a dataset of greater than 43,000 handwritten Arabic phrases, 30,000 used for training and 13,000 used for the testing stage. The recognition result showed a 99% rate of accuracy [4].

A number of tools and services have emerged in the market as a result of advances in such research. The quality, accuracy, and precision of OCR tools have become more effective and improved over the years. Today, from simple to complex, there is a wide range of OCR solutions available for use. Some of these tools may need programming skills to make them work while others are ready to use off-the-shelf solutions. Depending on their features and accuracy, the solution costs may vary, while some OCR tools are available to use for free too. Details of the most known OCR resources in the market are provided in the table below:

Name Founded Year License Online Programming Language SDK Arabic Language
QATIP 2016 Free Yes Unknown Arabic
Google Cloud Vision 2016 Proprietary Yes Unknown Yes Arabic; Modern Standard

+

More than 200

Tesseract 1985 Apache No C++, C Yes Arabic +

More than 100

ABBYY FineReader 1989 Proprietary Yes C/C++ Yes Arabic + 192
Asprise OCR SDK 1998 Proprietary Yes Java, C#,VB.NET, C/C++/Delphi Yes Arabic not supported + 20
AnyDoc Software 1989 Proprietary No VBScript Arabic not supported
CuneiForm 1996 BSD variant No C/C++ Yes Arabic not supported
Dynamsoft OCR SDK 2003 Proprietary Yes C/C++ Yes Arabic  + 40
OmniPage 1970s Proprietary Yes C/C++, C#[15] Yes Arabic  + 125
Ocrad 2003 GPL Yes C++ Yes Latin alphabet
SmartScore 1991 Proprietary No Music
Microsoft Office Document Imaging Proprietary No Arabic
Puma.NET 2006 BSD No C# Yes Arabic not supported  + 28
ReadSoft Proprietary No Arabic not supported
OCRFeeder 2009 GPL No Python Arabic not supported
OCRopus 2007 Apache No Python All languages using Latin script (other languages can be trained)

References

  1. Muna Ahmed Awel, Ali Imam Abidi, Review on optical character recognition, International Research Journal of Engineering and Technology (IRJET), p-ISSN: 2395-0072, Volume: 06 Issue: 06 | June 2019
  2. M. M. Altuwaijri and M. A. Bayoumi, “Arabic text recognition using neural networks,” pp. 415–418, 2002.
  3. N. Lamghari, M. E. H. Charaf, and S. Raghay, “Hybrid Feature Vector for the Recognition of Arabic Handwritten Characters Using Feed-Forward Neural Network,” Arab. J. Sci. Eng., vol. 43, no. 12, pp. 7031– 7039, 2018.
  4. N. A. Jebril, H. R. Al-Zoubi, and Q. Abu Al-Haija, “Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG),” Pattern Recognit. Image Anal., vol. 28, no. 2, pp. 321–345, 2018.

Share this