Video Tabanlı Sınıf Yoklamasının Derin Öğrenme ve Makine Öğrenmesi Temelli Hibrit Bir Yaklaşımla Gerçek Zamanlı Olarak Elde Edilmesi

Pınar İplikçi Ekincioğlu; Serkan Keser

doi:10.51764/smutgd.1795569

Research Article

Real-Time Acquisition of Video-Based Classroom Attendance Using a Hybrid Approach Based on Deep Learning and Machine Learning

Year 2025, Volume: 8 Issue: 2, 163 - 172

Pınar İplikçi Ekincioğlu , Serkan Keser

https://doi.org/10.51764/smutgd.1795569

Abstract

This study proposes a hybrid system based on face detection, face recognition, feature extraction, multi-classifier, and majority voting approach to generate real-time attendance from in-class video streaming. In the first stage, face candidates are identified using a Viola–Jones-based cascade detector, and classified as “face/not face” using a Convolutional Neural Network (CNN) validator model to eliminate false positives. From the verified faces, Histogram of Oriented Gradients (HOG) features, Convolutional Neural Network (CNN) features, and AlexNet-fc7 (LEX: Layer Extraction from AlexNet fc7) representations are extracted. For classification, Support Vector Machine (SVM) with radial basis kernel (RBF), k-Nearest Neighbour (KNN), Random Forest (RF), Bidirectional Long Short-Term Memory (BiLSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN) models were evaluated. A hybrid configuration combining all classifiers with majority voting was also implemented.The proposed structure achieved high accuracy under different student counts (4–12) and classroom scenarios, with the hybrid, GRU, and BiLSTM models in particular yielding stable results. The system provides unobtrusive and rapid attendance acquisition using only a camera and a computer, without requiring any additional hardware.

Keywords

Face detection , face recognition , HOG , AlexNet , class attendance

References

Ahonen, T., Hadid, A., & Pietikäinen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.
Cho, K., van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP, 1724–1734.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). IEEE., doi: 10.1109/CVPR.2005.177.
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. CVPR, 4690–4699, doi: 10.1109/CVPR.2019.00482.
Deng, J., Guo, J., Zhou, Y., et al. (2020). RetinaFace: Single-shot multi-level face localisation in the wild. CVPR, 5202–5211.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS, 1097–1105.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. BMVC.
Patil, A. R., & Shukla, A. (2014). Automated attendance using face recognition. International Journal of Computer Applications, 103(16), 6–10.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. CVPR, 815–823 , doi: 10.1109/CVPR.2015.7298682.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. CVPR, 511–518 , doi: 10.1109/CVPR.2001.990517.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multi-task cascaded convolutional networks. Signal Processing Letters, 23(10), 1499–1503.

Video Tabanlı Sınıf Yoklamasının Derin Öğrenme ve Makine Öğrenmesi Temelli Hibrit Bir Yaklaşımla Gerçek Zamanlı Olarak Elde Edilmesi

Year 2025, Volume: 8 Issue: 2, 163 - 172

Pınar İplikçi Ekincioğlu , Serkan Keser

https://doi.org/10.51764/smutgd.1795569

Abstract

Bu çalışma, sınıf içi video akışından gerçek zamanlı yoklama üretmek üzere yüz algılama, yüz tanıma, öznitelik çıkarımı, çoklu sınıflandırıcı ve çoğunluk oyu yaklaşımını temel alan hibrit bir sistem önermektedir. İlk aşamada Viola–Jones tabanlı kademeli (cascade) algılayıcı ile yüz adayları belirlenir ve bir Evrişimsel Sinir Ağı (CNN) doğrulayıcı model ile “yüz/yüz değil” olarak sınıflandırılarak yanlış pozitifler elenir. Doğrulanan yüzler üzerinde Yönlendirilmiş Gradyan Histogramı (HOG), Evrişimsel Sinir Ağı (CNN) ve AlexNet-fc7 (LEX: Layer Extraction from AlexNet fc7) öznitelikleri çıkarılır. Sınıflandırmada Destek Vektör Makineleri (SVM), En Yakın Komşu (KNN), Rastgele Orman (RF), Çift Yönlü Uzun Kısa Süreli Bellek (BiLSTM), Kapılı Tekrarlayan Birim (GRU) ve Evrişimsel Sinir Ağı (CNN) modelleri değerlendirilmiştir. Ayrıca tüm sınıflayıcıların hibrit olarak kullanıldığı ve kararın çoğunluk oyu ile verildiği bir çalışma yapılmıştır. Farklı öğrenci sayıları (4–12) ve çekim senaryolarında önerilen yapı yüksek doğruluk üretmiş; özellikle hibrit, GRU ve BiLSTM modelleri istikrarlı sonuçlar vermiştir. Sistem, ek donanım gerektirmeden yalnızca kamera görüntüsü ve bir bilgisayar yardımı ile müdahalesiz ve hızlı yoklama sağlamaktadır.

Keywords

Yüz tespiti , yüz tanıma , HOG , AlexNet , sınıf yoklaması

References

Ahonen, T., Hadid, A., & Pietikäinen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.
Cho, K., van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. EMNLP, 1724–1734.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). IEEE., doi: 10.1109/CVPR.2005.177.
Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. CVPR, 4690–4699, doi: 10.1109/CVPR.2019.00482.
Deng, J., Guo, J., Zhou, Y., et al. (2020). RetinaFace: Single-shot multi-level face localisation in the wild. CVPR, 5202–5211.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS, 1097–1105.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. BMVC.
Patil, A. R., & Shukla, A. (2014). Automated attendance using face recognition. International Journal of Computer Applications, 103(16), 6–10.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. CVPR, 815–823 , doi: 10.1109/CVPR.2015.7298682.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. CVPR, 511–518 , doi: 10.1109/CVPR.2001.990517.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multi-task cascaded convolutional networks. Signal Processing Letters, 23(10), 1499–1503.

There are 12 citations in total.

Details

Primary Language	Turkish
Subjects	Circuits and Systems
Journal Section	Articles
Authors	Pınar İplikçi Ekincioğlu 0009-0007-8296-5517 Serkan Keser 0000-0001-8435-0507
Early Pub Date	November 6, 2025
Publication Date	November 8, 2025
Submission Date	October 2, 2025
Acceptance Date	October 26, 2025
Published in Issue	Year 2025 Volume: 8 Issue: 2

Cite

APA	Ekincioğlu, P. İ., & Keser, S. (2025). Video Tabanlı Sınıf Yoklamasının Derin Öğrenme ve Makine Öğrenmesi Temelli Hibrit Bir Yaklaşımla Gerçek Zamanlı Olarak Elde Edilmesi. Sürdürülebilir Mühendislik Uygulamaları Ve Teknolojik Gelişmeler Dergisi, 8(2), 163-172. https://doi.org/10.51764/smutgd.1795569