Examples of Optical Character Recognition Tools
Research article Open access | Available online on: 25 November, 2020 | Last update: 27 October, 2021
Optical Character Recognition (OCR) refers to computer processes for converting images of printed, typed, or handwritten texts into text files. A computer requires OCR software to perform this task. This allows retrieving the text in the image and to save it in a file that can be used in a word processor for enrichment and stored in a database or on another medium that can be used by a computer system. Today, there many OCR engines that are used including Google Drive OCR, Tesseract, Transym, OmniPage, etc. Many of them are paid, however, some are accessible for free.
Recognizing Arabic text is a popular research topic, a significant amount of research efforts is being invested to increase the accuracy rate of Arabic OCR by using different approaches and technologies. In 2002, a system to recognize Arabic text using a neural network was developed using a set of moment invariants descriptors. An artificial neural network (ANN) is used for classification . The study has shown a high accuracy rate of 90% . Another research project made in 2017 used a database of 34,000 characters, 70% are used for training the machine learning, 15% for the testing phase, and 15% for validation. The project has achieved a 98.27% recognition rate . In 2018, a project aiming to recognize Arabic handwriting used a dataset of greater than 43,000 handwritten Arabic phrases, 30,000 used for training and 13,000 used for the testing stage. The recognition result showed a 99% rate of accuracy .
A number of tools and services have emerged in the market as a result of advances in such research. The quality, accuracy, and precision of OCR tools have become more effective and improved over the years. Today, from simple to complex, there is a wide range of OCR solutions available for use. Some of these tools may need programming skills to make them work while others are ready to use off-the-shelf solutions. Depending on their features and accuracy, the solution costs may vary, while some OCR tools are available to use for free too. Details of the most known OCR resources in the market are provided in the table below:
|Name||Founded Year||License||Online||Programming Language||SDK||Arabic Language|
|Google Cloud Vision||2016||Proprietary||Yes||Unknown||Yes||Arabic; Modern Standard
More than 200
|Tesseract||1985||Apache||No||C++, C||Yes||Arabic +
More than 100
|ABBYY FineReader||1989||Proprietary||Yes||C/C++||Yes||Arabic + 192|
|Asprise OCR SDK||1998||Proprietary||Yes||Java, C#,VB.NET, C/C++/Delphi||Yes||Arabic not supported + 20|
|AnyDoc Software||1989||Proprietary||No||VBScript||Arabic not supported|
|CuneiForm||1996||BSD variant||No||C/C++||Yes||Arabic not supported|
|Dynamsoft OCR SDK||2003||Proprietary||Yes||C/C++||Yes||Arabic + 40|
|OmniPage||1970s||Proprietary||Yes||C/C++, C#||Yes||Arabic + 125|
|Microsoft Office Document Imaging||–||Proprietary||No||–||Arabic|
|Puma.NET||2006||BSD||No||C#||Yes||Arabic not supported + 28|
|ReadSoft||–||Proprietary||No||–||Arabic not supported|
|OCRFeeder||2009||GPL||No||Python||Arabic not supported|
|OCRopus||2007||Apache||No||Python||All languages using Latin script (other languages can be trained)|
- Muna Ahmed Awel, Ali Imam Abidi, Review on optical character recognition, International Research Journal of Engineering and Technology (IRJET), p-ISSN: 2395-0072, Volume: 06 Issue: 06 | June 2019
- M. M. Altuwaijri and M. A. Bayoumi, “Arabic text recognition using neural networks,” pp. 415–418, 2002.
- N. Lamghari, M. E. H. Charaf, and S. Raghay, “Hybrid Feature Vector for the Recognition of Arabic Handwritten Characters Using Feed-Forward Neural Network,” Arab. J. Sci. Eng., vol. 43, no. 12, pp. 7031– 7039, 2018.
- N. A. Jebril, H. R. Al-Zoubi, and Q. Abu Al-Haija, “Recognition of Handwritten Arabic Characters using Histograms of Oriented Gradient (HOG),” Pattern Recognit. Image Anal., vol. 28, no. 2, pp. 321–345, 2018.