Research Article
BibTex RIS Cite

Cloud Based WEB Application Design for Automatic Turkish Business Card Recognition and Its Performance Evaluation

Year 2022, Volume: 10 Issue: 1, 118 - 134, 30.03.2022
https://doi.org/10.29109/gujsc.1030997

Abstract

In this study, digital-business card holder software was developed that digitally stores physical business cards prepared in Turkish in a cloud-based database. In the proposed software, the information on the physical business card is converted into text by optical character recognition method (OCR) using business card photos, and then the texts obtained with the help of developed algorithms are separated and grouped. Finally, the digitally obtained business card data is stored in the cloud-based database for later use. Considering the Turkish business cards, it is known that there are a wide variety of complex business cards unique to the country as well as the characters specific to the Turkish language. In this context, first of all, a method that correctly recognizes Turkish characters has been determined in the study. Later, name, mobile phone, e-mail address, company title, position and similar meaningful information were separated from the data read. In order to make these decompositions, special methods have been developed for each field and more accurate and meaningful data has been obtained with field-based algorithms. Thanks to the developed cloud-based platform-independent interface, it is possible to access data from more than one device with a single user over the internet. The study also offers a layered service architecture and database infrastructure that can be used by multiple accounts and multiple users connected to it simultaneously from a single platform. In addition, in the analyzes performed with the developed software, it was determined that 15 business cards with different features were read with an accuracy rate of over 80%.

References

  • Kakani B. V., Gandhi D., Jani S., Improved OCR based automatic vehicle number plate recognition using features trained neural network, In 2017 8th international conference on computing, communication and networking technologies, (2017) 1-6.
  • Shen H., Coughlan J. M., Towards a real-time system for finding and reading signs for visually impaired users, In International Conference on Computers for Handicapped Persons, Springer, Berlin, Heidelberg, (2012) 41-47.
  • Emekligil E., Arslan S., Agin O., A bank information extraction system based on named entity recognition with CRFs from noisy customer order texts in Turkish, In International Conference on Knowledge Engineering and the Semantic Web, Springer, Cham, (2016) 93-102.
  • Chauhan P., Luthra P., Ahmad Ansari I., Road Sign Detection Using Camera for Automated Driving Assistance System. In Proceedings of the International Conference on Advances in Electronics, Electrical & Computational Intelligence (ICAEEC), (2019).
  • Thuan N. H., Nhan D. T., Toan L. T., Giang N. X. H., Truong Q. B., An Android Business Card Reader Based on Google Vision: Design and Evaluation, In Context-Aware Systems and Applications, and Nature of Computation and Communication, Springer, Cham,(2019) 223-236.
  • Hung P. D., Linh D. Q., Implementing an android application for automatic vietnamese business card recognition, Pattern Recognition and Image Analysis, 29(1) (2019), 156-166.
  • Saiga H., Nakamura Y., Kitamura Y., Morita T., An OCR system for business cards, IEEE In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR'93), (1993) 802-805.
  • Chiou Y. H., Lee H. J., Recognition of Chinese business cards, IEEE in Proceedings of the Fourth International Conference on Document Analysis and Recognition, 2 (1997) 1028-1032.
  • Wang Y. K., Fan K. C., Juang Y. T., Chen T. H., Using hidden Markov model for chinese business card recognition, IEEE In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), 1 (2001) 1106-1109.
  • Dangiwa B. A., Kumar, S. S., A business card reader application for iOS devices based on Tesseract, IEEE In 2018 International Conference on Signal Processing and Information Security (ICSPIS), (2018) 1-4.
  • Shinde A., Tungar M., Khairnar P., Gunjkar J., Energy Efficient Business Card Recognition and Translation over Cloud Computing using Google Vision, GRD Journals- Global Research and Development Journal for Engineering, 2(4) (2017) 80-84.
  • Tesseract_(software), https://en.wikipedia.org/wiki/Tesseract_(software), Ziyaret Tarihi: 22.11.2021.
  • Smith R., An overview of the Tesseract OCR engine, IEEE In Ninth international conference on document analysis and recognition, 2 (2007) 629-633.
  • Smith R. W., Hybrid page layout analysis via tab-stop detection, IEEE In 2009 10th International Conference on Document Analysis and Recognition, (2009) 241-245.
  • Smith R., Antonova D., Lee D. S., Adapting the Tesseract open source OCR engine for multilingual OCR, In Proceedings of the International Workshop on Multilingual OCR, (2009) 1-8.
  • Smith R., Limits on the application of frequency-based language models to OCR, IEEE In 2011 International Conference on Document Analysis and Recognition, (2011) 538-542.
  • Lee D. S., Smith R., Improving book ocr by adaptive language and image models, IEEE In 2012 10th IAPR International Workshop on Document Analysis Systems, (2012) 115-119.
  • Unnikrishnan R., Smith R., Combined script and page orientation estimation using the tesseract ocr engine, IEEE In Proceedings of the international workshop on multilingual OCR, (2009) 1-7.
  • Rice S. V., Jenkins F. R., Nartker T. A., The fourth annual test of OCR accuracy,Technical Report 95, 3 (1995) 1-39.
  • Okutucu B. O., Bulut Bilişim ve Teknolojileri, Yüksek Lisans Tezi, İstanbul Okan Üniversitesi, Bilgisayar Mühendisliği Anabilim Dalı, (2012).
  • Tesseract OCR, https://github.com/tesseract-ocr, Ziyaret Tarihi: 22.11.2021.

Otomatik Türkçe Kartvizit Tanıma için Bulut Tabanlı WEB Uygulama Tasarımı ve Performans Değerlendirmesi

Year 2022, Volume: 10 Issue: 1, 118 - 134, 30.03.2022
https://doi.org/10.29109/gujsc.1030997

Abstract

Bu çalışmada, Türkçe hazırlanmış fiziksel kartvizitleri, sayısal olarak bulut tabanlı veritabanında saklayan dijital-kartvizitlik yazılımı geliştirilmiştir. Önerilen yazılımda, fiziksel kartvizit üzerindeki bilgiler kartvizit fotoğraflarından optik karakter tanıma (Optical Character Recognition: OCR) yöntemi ile metine çevrilmekte daha sonra geliştirilen algoritmalar yardımıyla elde edilen metinler ayrıştırılarak gruplandırılmaktadır. Son olarak sayısal olarak elde edilen kartvizit verileri, daha sonra kullanılmak üzere bulut tabanlı veritabanında saklanmaktadır. Türkçe kartvizitler göz önüne alındığında, Türk diline özgün karakterlerin yanı sıra ülkeye özgün çok çeşitli-karmaşık kartvizitlerin de olduğu bilinmektedir. Bu kapsamda çalışmada öncelikli olarak Türkçe karakterleri doğru tanıyan bir yöntem belirlenmiştir. Daha sonra okunan verilerden isimler, cep telefonu, e-posta adresi, şirket unvanı, görevi ve benzeri anlamlı kartvizit bilgilerinin ayrıştırılması yapılmıştır. Bu ayrıştırmaları yapabilmek için her alan için kendine özel yöntemler geliştirilerek alan bazlı algoritmalarla daha doğru ve anlamlı verilerin elde edilmesi sağlanmıştır. Geliştirilen bulut tabanlı, platformdan bağımsız arayüz sayesinde internet üzerinden tek kullanıcı ile birden fazla cihazdan verilere erişilebilmesine olanak sağlanmıştır. Çalışma aynı zamanda tek bir platformdan, birden çok hesap ve ona bağlı birden fazla kullanıcının aynı anda kullanabileceği katmanlı servis mimarisi ve veritabanı alt yapısı da sunmaktadır. Ayrıca geliştirilen yazılım ile gerçekleştirilen analizlerde, farklı özelliklere sahip 15 adet kartvizitin %80'in üzerindeki doğruluk oranı ile okunduğu tespit edilmiştir.

References

  • Kakani B. V., Gandhi D., Jani S., Improved OCR based automatic vehicle number plate recognition using features trained neural network, In 2017 8th international conference on computing, communication and networking technologies, (2017) 1-6.
  • Shen H., Coughlan J. M., Towards a real-time system for finding and reading signs for visually impaired users, In International Conference on Computers for Handicapped Persons, Springer, Berlin, Heidelberg, (2012) 41-47.
  • Emekligil E., Arslan S., Agin O., A bank information extraction system based on named entity recognition with CRFs from noisy customer order texts in Turkish, In International Conference on Knowledge Engineering and the Semantic Web, Springer, Cham, (2016) 93-102.
  • Chauhan P., Luthra P., Ahmad Ansari I., Road Sign Detection Using Camera for Automated Driving Assistance System. In Proceedings of the International Conference on Advances in Electronics, Electrical & Computational Intelligence (ICAEEC), (2019).
  • Thuan N. H., Nhan D. T., Toan L. T., Giang N. X. H., Truong Q. B., An Android Business Card Reader Based on Google Vision: Design and Evaluation, In Context-Aware Systems and Applications, and Nature of Computation and Communication, Springer, Cham,(2019) 223-236.
  • Hung P. D., Linh D. Q., Implementing an android application for automatic vietnamese business card recognition, Pattern Recognition and Image Analysis, 29(1) (2019), 156-166.
  • Saiga H., Nakamura Y., Kitamura Y., Morita T., An OCR system for business cards, IEEE In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR'93), (1993) 802-805.
  • Chiou Y. H., Lee H. J., Recognition of Chinese business cards, IEEE in Proceedings of the Fourth International Conference on Document Analysis and Recognition, 2 (1997) 1028-1032.
  • Wang Y. K., Fan K. C., Juang Y. T., Chen T. H., Using hidden Markov model for chinese business card recognition, IEEE In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), 1 (2001) 1106-1109.
  • Dangiwa B. A., Kumar, S. S., A business card reader application for iOS devices based on Tesseract, IEEE In 2018 International Conference on Signal Processing and Information Security (ICSPIS), (2018) 1-4.
  • Shinde A., Tungar M., Khairnar P., Gunjkar J., Energy Efficient Business Card Recognition and Translation over Cloud Computing using Google Vision, GRD Journals- Global Research and Development Journal for Engineering, 2(4) (2017) 80-84.
  • Tesseract_(software), https://en.wikipedia.org/wiki/Tesseract_(software), Ziyaret Tarihi: 22.11.2021.
  • Smith R., An overview of the Tesseract OCR engine, IEEE In Ninth international conference on document analysis and recognition, 2 (2007) 629-633.
  • Smith R. W., Hybrid page layout analysis via tab-stop detection, IEEE In 2009 10th International Conference on Document Analysis and Recognition, (2009) 241-245.
  • Smith R., Antonova D., Lee D. S., Adapting the Tesseract open source OCR engine for multilingual OCR, In Proceedings of the International Workshop on Multilingual OCR, (2009) 1-8.
  • Smith R., Limits on the application of frequency-based language models to OCR, IEEE In 2011 International Conference on Document Analysis and Recognition, (2011) 538-542.
  • Lee D. S., Smith R., Improving book ocr by adaptive language and image models, IEEE In 2012 10th IAPR International Workshop on Document Analysis Systems, (2012) 115-119.
  • Unnikrishnan R., Smith R., Combined script and page orientation estimation using the tesseract ocr engine, IEEE In Proceedings of the international workshop on multilingual OCR, (2009) 1-7.
  • Rice S. V., Jenkins F. R., Nartker T. A., The fourth annual test of OCR accuracy,Technical Report 95, 3 (1995) 1-39.
  • Okutucu B. O., Bulut Bilişim ve Teknolojileri, Yüksek Lisans Tezi, İstanbul Okan Üniversitesi, Bilgisayar Mühendisliği Anabilim Dalı, (2012).
  • Tesseract OCR, https://github.com/tesseract-ocr, Ziyaret Tarihi: 22.11.2021.
There are 21 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Tasarım ve Teknoloji
Authors

İbrahim Şahin 0000-0001-6714-9182

Mustafa Hikmet Bilgehan Uçar 0000-0002-9023-0023

Serdar Solak 0000-0003-1081-1598

Early Pub Date March 22, 2022
Publication Date March 30, 2022
Submission Date December 1, 2021
Published in Issue Year 2022 Volume: 10 Issue: 1

Cite

APA Şahin, İ., Uçar, M. H. B., & Solak, S. (2022). Cloud Based WEB Application Design for Automatic Turkish Business Card Recognition and Its Performance Evaluation. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 10(1), 118-134. https://doi.org/10.29109/gujsc.1030997

                                TRINDEX     16167        16166    21432    logo.png

      

    e-ISSN:2147-9526