Examples of Optical Character Recognition Tools

Oussama El Ghoul

Examples of Optical Character Recognition Tools

Oussama El Ghoul

Research article Online Open access | Available online on: 25 November, 2020 | Last update: 27 October, 2021

View PDF Nafath

Volume 6

Issue 15

Post Views: 1,670

Optical Character Recognition (OCR) refers to computer processes for converting images of printed, typed, or handwritten texts into text files. A computer requires OCR software to perform this task. This allows retrieving the text in the image and to save it in a file that can be used in a word processor for enrichment and stored in a database or on another medium that can be used by a computer system. Today, there many OCR engines that are used including Google Drive OCR, Tesseract, Transym, OmniPage, etc. Many of them are paid, however, some are accessible for free.

Recognizing Arabic text is a popular research topic, a significant amount of research efforts is being invested to increase the accuracy rate of Arabic OCR by using different approaches and technologies. In 2002, a system to recognize Arabic text using a neural network was developed using a set of moment invariants descriptors. An artificial neural network (ANN) is used for classification [1]. The study has shown a high accuracy rate of 90% [2]. Another research project made in 2017 used a database of 34,000 characters, 70% are used for training the machine learning, 15% for the testing phase, and 15% for validation. The project has achieved a 98.27% recognition rate [3]. In 2018, a project aiming to recognize Arabic handwriting used a dataset of greater than 43,000 handwritten Arabic phrases, 30,000 used for training and 13,000 used for the testing stage. The recognition result showed a 99% rate of accuracy [4].

A number of tools and services have emerged in the market as a result of advances in such research. The quality, accuracy, and precision of OCR tools have become more effective and improved over the years. Today, from simple to complex, there is a wide range of OCR solutions available for use. Some of these tools may need programming skills to make them work while others are ready to use off-the-shelf solutions. Depending on their features and accuracy, the solution costs may vary, while some OCR tools are available to use for free too. Details of the most known OCR resources in the market are provided in the table below:

Name	Founded Year	License	Online	Programming Language	SDK	Arabic Language
QATIP	2016	Free	Yes	Unknown		Arabic
Google Cloud Vision	2016	Proprietary	Yes	Unknown	Yes	Arabic; Modern Standard + More than 200
Tesseract	1985	Apache	No	C++, C	Yes	Arabic + More than 100
ABBYY FineReader	1989	Proprietary	Yes	C/C++	Yes	Arabic + 192
Asprise OCR SDK	1998	Proprietary	Yes	Java, C#,VB.NET, C/C++/Delphi	Yes	Arabic not supported + 20
AnyDoc Software	1989	Proprietary	No	VBScript		Arabic not supported
CuneiForm	1996	BSD variant	No	C/C++	Yes	Arabic not supported
Dynamsoft OCR SDK	2003	Proprietary	Yes	C/C++	Yes	Arabic + 40
OmniPage	1970s	Proprietary	Yes	C/C++, C#[15]	Yes	Arabic + 125
Ocrad	2003	GPL	Yes	C++	Yes	Latin alphabet
SmartScore	1991	Proprietary	No	–		Music
Microsoft Office Document Imaging	–	Proprietary	No	–		Arabic
Puma.NET	2006	BSD	No	C#	Yes	Arabic not supported + 28
ReadSoft	–	Proprietary	No	–		Arabic not supported
OCRFeeder	2009	GPL	No	Python		Arabic not supported
OCRopus	2007	Apache	No	Python		All languages using Latin script (other languages can be trained)

References

Muna Ahmed Awel, Ali Imam Abidi, Review on optical character recognition, International Research Journal of Engineering and Technology (IRJET), p-ISSN: 2395-0072, Volume: 06 Issue: 06 | June 2019
M. M. Altuwaijri and M. A. Bayoumi, “Arabic text recognition using neural networks,” pp. 415–418, 2002.
N. Lamghari, M. E. H. Charaf, and S. Raghay, “Hybrid Feature Vector for the Recognition of Arabic Handwritten Characters Using Feed-Forward Neural Network,” Arab. J. Sci. Eng., vol. 43, no. 12, pp. 7031– 7039, 2018.
N. A. Jebril, H. R. Al-Zoubi, and Q. Abu Al-Haija, “Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG),” Pattern Recognit. Image Anal., vol. 28, no. 2, pp. 321–345, 2018.

Post Views: 1,670

Examples of Optical Character Recognition Tools

Oussama El Ghoul

References

Share this