Araştırma Makalesi

Classification of Documents Extracted from Images with Optical Character Recognition Methods

Cilt: 6 Sayı: 2 1 Haziran 2021
PDF İndir
EN TR

Classification of Documents Extracted from Images with Optical Character Recognition Methods

Öz

Over the past decade, machine learning methods have given us driverless cars, voice recognition, effective web search, and a much better understanding of the human genome. Machine learning is so common today that it is used dozens of times a day, possibly unknowingly. Trying to teach a machine some processes or some situations can make them predict some results that are difficult to predict by the human brain. These methods also help us do some operations that are often impossible or difficult to do with human activities in a short time. For these reasons, machine learning is so important today. In this study, two different machine learning methods were combined. In order to solve a real-world problem, the manuscript documents were first transferred to the computer and then classified. We used three basic methods to realize the whole process. Handwriting or printed documents have been digitalized by a scanner or digital camera. These documents have been processed with two different Optical Character Recognition (OCR) operation. After that generated texts are classified by using Naive Bayes algorithm. All project was programmed in Microsoft Visual Studio 12 platform on Windows operating system. C# programming language was used for all parts of the study. Also, some prepared codes and DLLs were used.

Anahtar Kelimeler

Kaynakça

  1. Cord, M., & Cunningham, P. (2007). Machine Learning Techniques for Multimedia. 2008, 251-262.
  2. Holmes, G., Donkin, A., & Witten, I. H. (1994, November). Weka: A machine learning workbench. In Proceedings of ANZIIS'94-Australian New Zealnd Intelligent Information Systems Conference (pp. 357-361). IEEE.
  3. Kim, S. B., Han, K. S., Rim, H. C., & Myaeng, S. H. (2006). Some effective techniques for naive bayes text classification. IEEE transactions on knowledge and data engineering, 18(11), 1457-1466.
  4. Kirillov, A. (2013). Aforge. net framework. Retrieved September 25th from http://www. aforgenet. com, 68, 47-52.
  5. Manchanda, P., Gupta, S., & Bhatia, K. K. (2012). On the automated classification of web pages using artificial neural network. IOSRJCE, ISSN, 2278-066.
  6. Octave, G. N. U. (2012). Gnu octave. línea]. Available: http://www. gnu. org/software/octave.
  7. Qiang, G. (2010, May). An effective algorithm for improving the performance of Naive Bayes for text classification. In 2010 Second international conference on computer research and development.
  8. Singh, P., & Budhiraja, S. (2011). Feature extraction and classification techniques in OCR systems for handwritten Gurmukhi Script–a survey. International Journal of Engineering Research and Applications (IJERA), 1(4), 1736-1739.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yapay Zeka

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

1 Haziran 2021

Gönderilme Tarihi

19 Ocak 2021

Kabul Tarihi

26 Şubat 2021

Yayımlandığı Sayı

Yıl 2021 Cilt: 6 Sayı: 2

Kaynak Göster

APA
Aydın, Ö. (2021). Classification of Documents Extracted from Images with Optical Character Recognition Methods. Computer Science, 6(2), 46-55. https://izlik.org/JA95DG25BF
AMA
1.Aydın Ö. Classification of Documents Extracted from Images with Optical Character Recognition Methods. JCS. 2021;6(2):46-55. https://izlik.org/JA95DG25BF
Chicago
Aydın, Ömer. 2021. “Classification of Documents Extracted from Images with Optical Character Recognition Methods”. Computer Science 6 (2): 46-55. https://izlik.org/JA95DG25BF.
EndNote
Aydın Ö (01 Haziran 2021) Classification of Documents Extracted from Images with Optical Character Recognition Methods. Computer Science 6 2 46–55.
IEEE
[1]Ö. Aydın, “Classification of Documents Extracted from Images with Optical Character Recognition Methods”, JCS, c. 6, sy 2, ss. 46–55, Haz. 2021, [çevrimiçi]. Erişim adresi: https://izlik.org/JA95DG25BF
ISNAD
Aydın, Ömer. “Classification of Documents Extracted from Images with Optical Character Recognition Methods”. Computer Science 6/2 (01 Haziran 2021): 46-55. https://izlik.org/JA95DG25BF.
JAMA
1.Aydın Ö. Classification of Documents Extracted from Images with Optical Character Recognition Methods. JCS. 2021;6:46–55.
MLA
Aydın, Ömer. “Classification of Documents Extracted from Images with Optical Character Recognition Methods”. Computer Science, c. 6, sy 2, Haziran 2021, ss. 46-55, https://izlik.org/JA95DG25BF.
Vancouver
1.Ömer Aydın. Classification of Documents Extracted from Images with Optical Character Recognition Methods. JCS [Internet]. 01 Haziran 2021;6(2):46-55. Erişim adresi: https://izlik.org/JA95DG25BF

The Creative Commons Attribution 4.0 International License 88x31.png  is applied to all research papers published by JCS and

a Digital Object Identifier (DOI)     Logo_TM.png  is assigned for each published paper.