Vision Transformer Based Photo Capturing System

Abdülkadir Albayrak

doi:10.17694/bajece.1345993

Araştırma Makalesi

Vision Transformer Based Photo Capturing System

Yıl 2023, Cilt: 11 Sayı: 4, 316 - 321, 22.12.2023

Abdülkadir Albayrak

https://doi.org/10.17694/bajece.1345993

Öz

Portrait photo is one of the most crucial documents that many people need for official transactions in many public and private organizations. Despite the developing technologies and high resolution imaging devices, people need such photographer offices to fulfil their needs to take photos. In this study, a Photo Capturing System has been developed to provide infrastructure for web and mobile applications. After the system detects the person's face, facial orientation and facial expression, it automatically takes a photo and sends it to a graphical user interface developed for this purpose. Then, with the help of the user interface of the photo taken by the system, it is automatically printed out. The proposed study is a unique study that uses imaging technologies, deep learning and vision transformer algorithms, which are very popular image processing techniques in several years. Within the scope of the study, face detection and facial expression recognition are performed with a success rate of close to 100\% and 95.52\%, respectively. In the study, the performances of Vision Transformer algorithm is also compared with the state of art algorithms in facial expression recognition.

Anahtar Kelimeler

Vision transformer, facial expression recognition, photo capturing system, single shot detection, deep learning

Kaynakça

[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.
[2] S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps via regressing local binary features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1685–1692.
[3] O. Déniz, G. Bueno, J. Salido, and F. De la Torre, “Face recognition using histograms of oriented gradients,” Pattern recognition letters, vol. 32, no. 12, pp. 1598–1603, 2011.
[4] I. M. Revina and W. S. Emmanuel, “A survey on human face expression recognition techniques,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 619–628, 2021.
[5] H. Li, M. Sui, F. Zhao, Z. Zha, and F. Wu, “Mvt: Mask vision transformer for facial expression recognition in the wild,” arXiv preprint arXiv:2106.04520, 2021.
[6] S. M. González-Lozoya, J. de la Calleja, L. Pellegrin, H. J. Escalante, M. Medina, A. Benitez-Ruiz et al., “Recognition of facial expressions based on cnn features,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 13 987–14 007, 2020.
[7] D. O. Melinte and L. Vladareanu, “Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer,” Sensors, vol. 20, no. 8, p. 2393, 2020.
[8] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM Computing Surveys (CSUR), 2021.
[9] M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang, “Intriguing properties of vision transformers,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[10] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM international conference on Multimedia, 2007, pp. 357–360.
[11] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, pp. 915–928, 2007.
[12] Z. Wang, S. Wang, and Q. Ji, “Capturing complex spatio-temporal relations among facial muscles for facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3422–3429.
[13] G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett, “The computer expression recognition toolbox (cert),” in 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG). IEEE, 2011, pp. 298–305.
[14] S. W. Chew, S. Lucey, P. Lucey, S. Sridharan, and J. F. Conn, “Improved facial expression recognition via uni-hyperplane classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 2554–2561.
[15] R. Ptucha, G. Tsagkatakis, and A. Savakis, “Manifold based sparse representation for robust expression recognition without neutral subtraction,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011, pp. 2136–2143.
[16] S. Jain, C. Hu, and J. K. Aggarwal, “Facial expression recognition with temporal modeling of shapes,” in 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, 2011, pp. 1642–1649.
[17] M. Liu, S. Shan, R. Wang, and X. Chen, “Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1749–1756.
[18] M. Liu, S. Li, S. Shan, and X. Chen, “Au-inspired deep networks for facial expression feature learning,” Neurocomputing, vol. 159, pp. 126–136, 2015.
[19] P. Liu, S. Han, Z. Meng, and Y. Tong, “Facial expression recognition via a boosted deep belief network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1805–1812.
[20] X. Sun, M. Lv, C. Quan, and F. Ren, “Improved facial expression recognition method based on roi deep convolutional neutral network,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2017, pp. 256–261

Yıl 2023, Cilt: 11 Sayı: 4, 316 - 321, 22.12.2023

Abdülkadir Albayrak

https://doi.org/10.17694/bajece.1345993

Öz

Kaynakça

[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.
[2] S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps via regressing local binary features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1685–1692.
[3] O. Déniz, G. Bueno, J. Salido, and F. De la Torre, “Face recognition using histograms of oriented gradients,” Pattern recognition letters, vol. 32, no. 12, pp. 1598–1603, 2011.
[4] I. M. Revina and W. S. Emmanuel, “A survey on human face expression recognition techniques,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 619–628, 2021.
[5] H. Li, M. Sui, F. Zhao, Z. Zha, and F. Wu, “Mvt: Mask vision transformer for facial expression recognition in the wild,” arXiv preprint arXiv:2106.04520, 2021.
[6] S. M. González-Lozoya, J. de la Calleja, L. Pellegrin, H. J. Escalante, M. Medina, A. Benitez-Ruiz et al., “Recognition of facial expressions based on cnn features,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 13 987–14 007, 2020.
[7] D. O. Melinte and L. Vladareanu, “Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer,” Sensors, vol. 20, no. 8, p. 2393, 2020.
[8] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM Computing Surveys (CSUR), 2021.
[9] M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang, “Intriguing properties of vision transformers,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[10] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM international conference on Multimedia, 2007, pp. 357–360.
[11] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, pp. 915–928, 2007.
[12] Z. Wang, S. Wang, and Q. Ji, “Capturing complex spatio-temporal relations among facial muscles for facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3422–3429.
[13] G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett, “The computer expression recognition toolbox (cert),” in 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG). IEEE, 2011, pp. 298–305.
[14] S. W. Chew, S. Lucey, P. Lucey, S. Sridharan, and J. F. Conn, “Improved facial expression recognition via uni-hyperplane classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 2554–2561.
[15] R. Ptucha, G. Tsagkatakis, and A. Savakis, “Manifold based sparse representation for robust expression recognition without neutral subtraction,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011, pp. 2136–2143.
[16] S. Jain, C. Hu, and J. K. Aggarwal, “Facial expression recognition with temporal modeling of shapes,” in 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, 2011, pp. 1642–1649.
[17] M. Liu, S. Shan, R. Wang, and X. Chen, “Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1749–1756.
[18] M. Liu, S. Li, S. Shan, and X. Chen, “Au-inspired deep networks for facial expression feature learning,” Neurocomputing, vol. 159, pp. 126–136, 2015.
[19] P. Liu, S. Han, Z. Meng, and Y. Tong, “Facial expression recognition via a boosted deep belief network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1805–1812.
[20] X. Sun, M. Lv, C. Quan, and F. Ren, “Improved facial expression recognition method based on roi deep convolutional neutral network,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2017, pp. 256–261

Toplam 20 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Araştırma Makalesi
Yazarlar	Abdülkadir Albayrak 0000-0002-0738-871X
Erken Görünüm Tarihi	10 Ocak 2024
Yayımlanma Tarihi	22 Aralık 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 11 Sayı: 4

Kaynak Göster

APA	Albayrak, A. (2023). Vision Transformer Based Photo Capturing System. Balkan Journal of Electrical and Computer Engineering, 11(4), 316-321. https://doi.org/10.17694/bajece.1345993

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited. Creative Commons LisansÄ±