Video Tabanlı Sınıf Yoklamasının Derin Öğrenme ve Makine Öğrenmesi Temelli Hibrit Bir Yaklaşımla Gerçek Zamanlı Olarak Elde Edilmesi

Pınar İplikçi Ekincioğlu; Serkan Keser

doi:10.51764/smutgd.1795569

Araştırma Makalesi

Real-Time Acquisition of Video-Based Classroom Attendance Using a Hybrid Approach Based on Deep Learning and Machine Learning

Yıl 2025, Cilt: 8 Sayı: 2, 163 - 172

Pınar İplikçi Ekincioğlu , Serkan Keser

https://doi.org/10.51764/smutgd.1795569

Öz

This study proposes a hybrid system based on face detection, face recognition, feature extraction, multi-classifier, and majority voting approach to generate real-time attendance from in-class video streaming. In the first stage, face candidates are identified using a Viola–Jones-based cascade detector, and classified as “face/not face” using a Convolutional Neural Network (CNN) validator model to eliminate false positives. From the verified faces, Histogram of Oriented Gradients (HOG) features, Convolutional Neural Network (CNN) features, and AlexNet-fc7 (LEX: Layer Extraction from AlexNet fc7) representations are extracted. For classification, Support Vector Machine (SVM) with radial basis kernel (RBF), k-Nearest Neighbour (KNN), Random Forest (RF), Bidirectional Long Short-Term Memory (BiLSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN) models were evaluated. A hybrid configuration combining all classifiers with majority voting was also implemented.The proposed structure achieved high accuracy under different student counts (4–12) and classroom scenarios, with the hybrid, GRU, and BiLSTM models in particular yielding stable results. The system provides unobtrusive and rapid attendance acquisition using only a camera and a computer, without requiring any additional hardware.

Anahtar Kelimeler

Face detection , face recognition , HOG , AlexNet , class attendance

Kaynakça

Ahonen, T., Hadid, A., & Pietikäinen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.
Cho, K., van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP, 1724–1734.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). IEEE., doi: 10.1109/CVPR.2005.177.
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. CVPR, 4690–4699, doi: 10.1109/CVPR.2019.00482.
Deng, J., Guo, J., Zhou, Y., et al. (2020). RetinaFace: Single-shot multi-level face localisation in the wild. CVPR, 5202–5211.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS, 1097–1105.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. BMVC.
Patil, A. R., & Shukla, A. (2014). Automated attendance using face recognition. International Journal of Computer Applications, 103(16), 6–10.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. CVPR, 815–823 , doi: 10.1109/CVPR.2015.7298682.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. CVPR, 511–518 , doi: 10.1109/CVPR.2001.990517.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multi-task cascaded convolutional networks. Signal Processing Letters, 23(10), 1499–1503.

Video Tabanlı Sınıf Yoklamasının Derin Öğrenme ve Makine Öğrenmesi Temelli Hibrit Bir Yaklaşımla Gerçek Zamanlı Olarak Elde Edilmesi

Yıl 2025, Cilt: 8 Sayı: 2, 163 - 172

Pınar İplikçi Ekincioğlu , Serkan Keser

https://doi.org/10.51764/smutgd.1795569

Öz

Bu çalışma, sınıf içi video akışından gerçek zamanlı yoklama üretmek üzere yüz algılama, yüz tanıma, öznitelik çıkarımı, çoklu sınıflandırıcı ve çoğunluk oyu yaklaşımını temel alan hibrit bir sistem önermektedir. İlk aşamada Viola–Jones tabanlı kademeli (cascade) algılayıcı ile yüz adayları belirlenir ve bir Evrişimsel Sinir Ağı (CNN) doğrulayıcı model ile “yüz/yüz değil” olarak sınıflandırılarak yanlış pozitifler elenir. Doğrulanan yüzler üzerinde Yönlendirilmiş Gradyan Histogramı (HOG), Evrişimsel Sinir Ağı (CNN) ve AlexNet-fc7 (LEX: Layer Extraction from AlexNet fc7) öznitelikleri çıkarılır. Sınıflandırmada Destek Vektör Makineleri (SVM), En Yakın Komşu (KNN), Rastgele Orman (RF), Çift Yönlü Uzun Kısa Süreli Bellek (BiLSTM), Kapılı Tekrarlayan Birim (GRU) ve Evrişimsel Sinir Ağı (CNN) modelleri değerlendirilmiştir. Ayrıca tüm sınıflayıcıların hibrit olarak kullanıldığı ve kararın çoğunluk oyu ile verildiği bir çalışma yapılmıştır. Farklı öğrenci sayıları (4–12) ve çekim senaryolarında önerilen yapı yüksek doğruluk üretmiş; özellikle hibrit, GRU ve BiLSTM modelleri istikrarlı sonuçlar vermiştir. Sistem, ek donanım gerektirmeden yalnızca kamera görüntüsü ve bir bilgisayar yardımı ile müdahalesiz ve hızlı yoklama sağlamaktadır.

Anahtar Kelimeler

Yüz tespiti , yüz tanıma , HOG , AlexNet , sınıf yoklaması

Kaynakça

Ahonen, T., Hadid, A., & Pietikäinen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.
Cho, K., van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP, 1724–1734.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). IEEE., doi: 10.1109/CVPR.2005.177.
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. CVPR, 4690–4699, doi: 10.1109/CVPR.2019.00482.
Deng, J., Guo, J., Zhou, Y., et al. (2020). RetinaFace: Single-shot multi-level face localisation in the wild. CVPR, 5202–5211.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS, 1097–1105.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. BMVC.
Patil, A. R., & Shukla, A. (2014). Automated attendance using face recognition. International Journal of Computer Applications, 103(16), 6–10.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. CVPR, 815–823 , doi: 10.1109/CVPR.2015.7298682.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. CVPR, 511–518 , doi: 10.1109/CVPR.2001.990517.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multi-task cascaded convolutional networks. Signal Processing Letters, 23(10), 1499–1503.

Toplam 12 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Devreler ve Sistemler
Bölüm	Makaleler
Yazarlar	Pınar İplikçi Ekincioğlu 0009-0007-8296-5517 Serkan Keser 0000-0001-8435-0507
Erken Görünüm Tarihi	6 Kasım 2025
Yayımlanma Tarihi	8 Kasım 2025
Gönderilme Tarihi	2 Ekim 2025
Kabul Tarihi	26 Ekim 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 8 Sayı: 2

Kaynak Göster

APA	Ekincioğlu, P. İ., & Keser, S. (2025). Video Tabanlı Sınıf Yoklamasının Derin Öğrenme ve Makine Öğrenmesi Temelli Hibrit Bir Yaklaşımla Gerçek Zamanlı Olarak Elde Edilmesi. Sürdürülebilir Mühendislik Uygulamaları ve Teknolojik Gelişmeler Dergisi, 8(2), 163-172. https://doi.org/10.51764/smutgd.1795569