In many different fields, there is a high demand for storing information
to a computer storage disk from the data available in printed or handwritten
documents or images to later re-utilize this information by means of computers.
One simple way to store information to a computer system from these printed
documents could be first to scan the documents and then store them as image
files. But to re-utilize this information, it would very difficult to read or
query text or other information from these image files. Therefore a technique
to automatically retrieve and store information, in particular text, from image
files is needed. Optical character recognition is an active research area that
attempts to develop a computer system with the ability to extract and process
text from images automatically. The objective of OCR is to achieve modification
or conversion of any form of text or text-containing documents such as
handwritten text, printed or scanned text images, into an editable digital
format for deeper and further processing. Therefore, OCR enables a machine to
automatically recognize text in such documents. Some major challenges need to
be recognized and handled in order to achieve a successful automation. The font
characteristics of the characters in paper documents and quality of images are
only some of the recent challenges. Due to these challenges, characters
sometimes may not be recognized correctly by computer system. In this paper we
investigate OCR in four different ways. First we give a detailed overview of
the challenges that might emerge in OCR stages. Second, we review the general
phases of an OCR system such as pre-processing, segmentation, normalization,
feature extraction, classification and post-processing. Then, we highlight
developments and main applications and uses of OCR and finally, a brief OCR
history are discussed. Therefore, this discussion provides a very comprehensive
review of the state-of-the-art of the field.
Konular | Mühendislik |
---|---|
Bölüm | Research Article |
Yazarlar | |
Yayımlanma Tarihi | 1 Aralık 2016 |
Yayımlandığı Sayı | Yıl 2016 |