Araştırma Makalesi
BibTex RIS Kaynak Göster

An End-to-End Deep Learning Architecture for Information Extraction from Turkish Identity Cards

Yıl 2025, Cilt: 5 Sayı: 2, 81 - 86, 23.12.2025
https://doi.org/10.54569/aair.1845016

Öz

With the acceleration of digital transformation in the service sector, remote customer acquisition and identity verification processes have become the cornerstone of secure ecosystems. Particularly in internet-based services, image distortions, perspective errors, and variable lighting conditions encountered during the transfer of physical documents to the digital environment are the most significant factors complicating data extraction. In this study, a deep learning-based end-to-end
architecture is proposed that enables fast, secure, and high-accuracy information extraction from Turkish Republic Identity Cards. In the proposed system, while CURL is used to enhance image quality, a YOLOv8m-based instance segmentation model is preferred for detecting the boundaries of the card. For the determination of card orientation and perspective correction, a novel hybrid approach has been developed that analyzes the cosine distance between face biometrics obtained via RetinaFace and the segmentation mask. This structure, in which PAN for text detection and Transformer-based TrOCR models for character recognition are integrated, was tested on a unique dataset augmented with the CLoDSA library. Experimental results indicated that the YOLOv8m model exhibited success in card detection with a 99.5% mAP score. Our proposed model demonstrates that it offers an efficient solution for digital identity verification processes with an overall accuracy rate of 92.6%.

Destekleyen Kurum

Burdur Mehmet Akif Ersoy University Scientific Research Projects Commission

Proje Numarası

0832-YL-22

Teşekkür

This study was supported by Burdur Mehmet Akif Ersoy University Scientific Research Projects Commission. Project Number: 0832-YL-22. This study was produced from the first author's thesis titled 'Image Processing Based Identity Recognition and Liveness Analysis System with Microservice'.

Kaynakça

  • U. Mir, A. K. Kar, and M. P. Gupta, "AI-enabled digital identity–inputs for stakeholders and policymakers," Journal of Science and Technology Policy Management, vol. 13, no. 3, pp. 514-541, 2022.
  • B. K. Bulatovich et al., "MIDV-2020: A comprehensive benchmark dataset for identity document analysis," Компьютерная оптика, vol. 46, no. 2, pp. 252-270, 2022.
  • M. K. Gupta, R. Shah, J. Rathod, and A. Kumar, "Smartidocr: Automatic detection and recognition of identity card number using deep networks," in 2021 Sixth International Conference on Image Information Processing (ICIIP), 2021, vol. 6: IEEE, pp. 267-272.
  • W. Yu, N. Lu, X. Qi, P. Gong, and R. Xiao, "Pick: processing key information extraction from documents using improved graph learning-convolutional networks," in 2020 25th International conference on pattern recognition (ICPR), 2021: IEEE, pp. 4363-4370.
  • P. Zhang et al., "Trie: end-to-end text reading and information extraction for document understanding," in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1413-1422.
  • Z. Gu et al., "Xylayoutlm: Towards layout-aware multimodal networks for visually-rich document understanding," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4583-4592.
  • Y. Huang, T. Lv, L. Cui, Y. Lu, and F. Wei, "Layoutlmv3: Pre-training for document ai with unified text and image masking," in Proceedings of the 30th ACM international conference on multimedia, 2022, pp. 4083-4091.
  • J. Wang, L. Jin, and K. Ding, "Lilt: A simple yet effective language-independent layout transformer for structured document understanding," arXiv preprint arXiv:2202.13669, 2022.
  • S. Appalaraju, B. Jasani, B. U. Kota, Y. Xie, and R. Manmatha, "Docformer: End-to-end transformer for document understanding," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 993-1003.
  • M. Li et al., "Trocr: Transformer-based optical character recognition with pre-trained models," in Proceedings of the AAAI conference on artificial intelligence, 2023, vol. 37, no. 11, pp. 13094-13102.
  • Z. Tang et al., "Unifying vision, text, and layout for universal document processing," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 19254-19264.
  • S. Freitter, "Automating ID Card Verification leveraging Deep Learning," Technische Universität Wien, 2025.
  • B. Sekachev et al., "opencv/cvat: v1. 1.0," Zenodo, 2020.
  • S. Moran, S. McDonagh, and G. Slabaugh, "Curl: Neural curve layers for global image enhancement," in 2020 25th International Conference on Pattern Recognition (ICPR), 2021: IEEE, pp. 9796-9803.
  • Á. Casado-García et al., "CLoDSA: a tool for augmentation in classification, localization, detection, semantic segmentation and instance segmentation tasks," BMC bioinformatics, vol. 20, no. 1, p. 323, 2019.
  • J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, "Retinaface: Single-shot multi-level face localisation in the wild," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5203-5212.
  • S. Yang, P. Luo, C.-C. Loy, and X. Tang, "Wider face: A face detection benchmark," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525-5533.
  • W. Wang et al., "Efficient and accurate arbitrary-shaped text detection with pixel aggregation network," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 8440-8449.

Türk Kimlik Kartlarından Bilgi Çıkarımı İçin Uçtan Uca Bir Derin Öğrenme Mimarisi

Yıl 2025, Cilt: 5 Sayı: 2, 81 - 86, 23.12.2025
https://doi.org/10.54569/aair.1845016

Öz

Hizmet sektöründeki dijital dönüşümün hız kazanmasıyla birlikte, uzaktan müşteri edinimi ve kimlik doğrulama süreçleri, güvenli ekosistemlerin temel yapı taşı haline gelmiştir. Özellikle internet tabanlı hizmetlerde, fiziksel belgelerin dijital ortama aktarılması sırasında karşılaşılan görüntü bozulmaları, perspektif hataları ve değişken aydınlatma koşulları, veri çıkarımını zorlaştıran en önemli faktörlerdir. Bu çalışmada, Türkiye Cumhuriyeti Kimlik Kartları üzerinden hızlı, güvenli ve yüksek doğrulukla bilgi çıkarımını sağlayan, derin öğrenme tabanlı uçtan uca bir mimari önerilmiştir. Önerilen sistemde; görüntü kalitesini artırmak için CURL kullanılırken, kartın sınırlarının tespiti için YOLOv8m tabanlı bir instance segmentation modeli tercih edilmiştir. Kartın oryantasyonunun belirlenmesi ve perspektif düzeltmesi aşamasında ise RetinaFace ile elde edilen yüz biyometrisi ve segmentasyon maskesi arasındaki kosinüs uzaklığını analiz eden özgün bir hibrit yaklaşım geliştirilmiştir. Metin tespiti için PAN ve karakter tanıma için Transformer tabanlı TrOCR modellerinin entegre edildiği bu yapı, CLoDSA kütüphanesi ile çeşitlendirilen özgün bir veri kümesi üzerinde test edilmiştir. Deneysel sonuçlar, YOLOv8m modelinin %99.5 mAP skoru ile kart tespitinde başarı sergilediği gözlemlenmiştir. Önerilen modelimiz %92.6’lık genel doğruluk oranı ile dijital kimlik doğrulama süreçleri için verimli bir çözüm sunduğunu ortaya koymaktadır.

Destekleyen Kurum

Burdur Mehmet Akif Ersoy Üniversitesi Bilimsel Araştırma Projeleri Komisyonu

Proje Numarası

0832-YL-22

Teşekkür

Bu çalışma, Burdur Mehmet Akif Ersoy Üniversitesi Bilimsel Araştırma Projeleri Komisyonu tarafından desteklenmiştir. Proje Numarası: 0832-YL-22. Bu çalışma, ilk yazarın 'Görüntü İşleme Tabanlı Mikroservis ile Kimlik Tanıma ve Canlılık Analizi Sistemi' başlıklı tezinden üretilmiştir.

Kaynakça

  • U. Mir, A. K. Kar, and M. P. Gupta, "AI-enabled digital identity–inputs for stakeholders and policymakers," Journal of Science and Technology Policy Management, vol. 13, no. 3, pp. 514-541, 2022.
  • B. K. Bulatovich et al., "MIDV-2020: A comprehensive benchmark dataset for identity document analysis," Компьютерная оптика, vol. 46, no. 2, pp. 252-270, 2022.
  • M. K. Gupta, R. Shah, J. Rathod, and A. Kumar, "Smartidocr: Automatic detection and recognition of identity card number using deep networks," in 2021 Sixth International Conference on Image Information Processing (ICIIP), 2021, vol. 6: IEEE, pp. 267-272.
  • W. Yu, N. Lu, X. Qi, P. Gong, and R. Xiao, "Pick: processing key information extraction from documents using improved graph learning-convolutional networks," in 2020 25th International conference on pattern recognition (ICPR), 2021: IEEE, pp. 4363-4370.
  • P. Zhang et al., "Trie: end-to-end text reading and information extraction for document understanding," in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1413-1422.
  • Z. Gu et al., "Xylayoutlm: Towards layout-aware multimodal networks for visually-rich document understanding," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4583-4592.
  • Y. Huang, T. Lv, L. Cui, Y. Lu, and F. Wei, "Layoutlmv3: Pre-training for document ai with unified text and image masking," in Proceedings of the 30th ACM international conference on multimedia, 2022, pp. 4083-4091.
  • J. Wang, L. Jin, and K. Ding, "Lilt: A simple yet effective language-independent layout transformer for structured document understanding," arXiv preprint arXiv:2202.13669, 2022.
  • S. Appalaraju, B. Jasani, B. U. Kota, Y. Xie, and R. Manmatha, "Docformer: End-to-end transformer for document understanding," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 993-1003.
  • M. Li et al., "Trocr: Transformer-based optical character recognition with pre-trained models," in Proceedings of the AAAI conference on artificial intelligence, 2023, vol. 37, no. 11, pp. 13094-13102.
  • Z. Tang et al., "Unifying vision, text, and layout for universal document processing," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 19254-19264.
  • S. Freitter, "Automating ID Card Verification leveraging Deep Learning," Technische Universität Wien, 2025.
  • B. Sekachev et al., "opencv/cvat: v1. 1.0," Zenodo, 2020.
  • S. Moran, S. McDonagh, and G. Slabaugh, "Curl: Neural curve layers for global image enhancement," in 2020 25th International Conference on Pattern Recognition (ICPR), 2021: IEEE, pp. 9796-9803.
  • Á. Casado-García et al., "CLoDSA: a tool for augmentation in classification, localization, detection, semantic segmentation and instance segmentation tasks," BMC bioinformatics, vol. 20, no. 1, p. 323, 2019.
  • J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, "Retinaface: Single-shot multi-level face localisation in the wild," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5203-5212.
  • S. Yang, P. Luo, C.-C. Loy, and X. Tang, "Wider face: A face detection benchmark," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5525-5533.
  • W. Wang et al., "Efficient and accurate arbitrary-shaped text detection with pixel aggregation network," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 8440-8449.
Toplam 18 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgisayar Görüşü, Görüntü İşleme, Yapay Zeka (Diğer)
Bölüm Araştırma Makalesi
Yazarlar

Ömer Can Eskicioğlu 0000-0001-5644-2957

Ali Hakan Işik 0000-0003-3561-9375

Proje Numarası 0832-YL-22
Gönderilme Tarihi 19 Aralık 2025
Kabul Tarihi 23 Aralık 2025
Yayımlanma Tarihi 23 Aralık 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 5 Sayı: 2

Kaynak Göster

IEEE Ö. C. Eskicioğlu ve A. H. Işik, “An End-to-End Deep Learning Architecture for Information Extraction from Turkish Identity Cards”, Adv. Artif. Intell. Res., c. 5, sy. 2, ss. 81–86, 2025, doi: 10.54569/aair.1845016.

Advances in Artificial Intelligence Research is an open access journal which means that the content is freely available without charge to the user or his/her institution. All papers are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows users to distribute, remix, adapt, and build upon the material in any medium or format for non-commercial purposes only, and only so long as attribution is given to the creator.

Graphic design @ Özden Işıktaş