Balkan Journal of Electrical and Computer Engineering

2147-284X 2147-284X

MUSA YILMAZ

10.17694/bajece.1345993

Computer Software

Bilgisayar Yazılımı

Vision Transformer Based Photo Capturing System

https://orcid.org/0000-0002-0738-871X

Albayrak

Abdülkadir

DICLE UNIVERSITY

12 22 2023

11 4 316 321 08 18 2023 12 22 2023

2013

Balkan Journal of Electrical and Computer Engineering

Portrait photo is one of the most crucial documents that many people need for official transactions in many public and private organizations. Despite the developing technologies and high resolution imaging devices, people need such photographer offices to fulfil their needs to take photos. In this study, a Photo Capturing System has been developed to provide infrastructure for web and mobile applications. After the system detects the person's face, facial orientation and facial expression, it automatically takes a photo and sends it to a graphical user interface developed for this purpose. Then, with the help of the user interface of the photo taken by the system, it is automatically printed out. The proposed study is a unique study that uses imaging technologies, deep learning and vision transformer algorithms, which are very popular image processing techniques in several years. Within the scope of the study, face detection and facial expression recognition are performed with a success rate of close to 100\% and 95.52\%, respectively. In the study, the performances of Vision Transformer algorithm is also compared with the state of art algorithms in facial expression recognition.

Vision transformer facial expression recognition photo capturing system single shot detection deep learning

[1] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.

[2] S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps via regressing local binary features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1685–1692.

[3] O. Déniz, G. Bueno, J. Salido, and F. De la Torre, “Face recognition using histograms of oriented gradients,” Pattern recognition letters, vol. 32, no. 12, pp. 1598–1603, 2011.

[4] I. M. Revina and W. S. Emmanuel, “A survey on human face expression recognition techniques,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 619–628, 2021.

[5] H. Li, M. Sui, F. Zhao, Z. Zha, and F. Wu, “Mvt: Mask vision transformer for facial expression recognition in the wild,” arXiv preprint arXiv:2106.04520, 2021.

[6] S. M. González-Lozoya, J. de la Calleja, L. Pellegrin, H. J. Escalante, M. Medina, A. Benitez-Ruiz et al., “Recognition of facial expressions based on cnn features,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 13 987–14 007, 2020.

[7] D. O. Melinte and L. Vladareanu, “Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer,” Sensors, vol. 20, no. 8, p. 2393, 2020.

[8] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM Computing Surveys (CSUR), 2021.

[9] M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang, “Intriguing properties of vision transformers,” Advances in Neural Information Processing Systems, vol. 34, 2021.

[10] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM international conference on Multimedia, 2007, pp. 357–360.

[11] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, pp. 915–928, 2007.

[12] Z. Wang, S. Wang, and Q. Ji, “Capturing complex spatio-temporal relations among facial muscles for facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3422–3429.

[13] G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett, “The computer expression recognition toolbox (cert),” in 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG). IEEE, 2011, pp. 298–305.

[14] S. W. Chew, S. Lucey, P. Lucey, S. Sridharan, and J. F. Conn, “Improved facial expression recognition via uni-hyperplane classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 2554–2561.

[15] R. Ptucha, G. Tsagkatakis, and A. Savakis, “Manifold based sparse representation for robust expression recognition without neutral subtraction,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011, pp. 2136–2143.

[16] S. Jain, C. Hu, and J. K. Aggarwal, “Facial expression recognition with temporal modeling of shapes,” in 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, 2011, pp. 1642–1649.

[17] M. Liu, S. Shan, R. Wang, and X. Chen, “Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1749–1756.

[18] M. Liu, S. Li, S. Shan, and X. Chen, “Au-inspired deep networks for facial expression feature learning,” Neurocomputing, vol. 159, pp. 126–136, 2015.

[19] P. Liu, S. Han, Z. Meng, and Y. Tong, “Facial expression recognition via a boosted deep belief network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1805–1812.

[20] X. Sun, M. Lv, C. Quan, and F. Ren, “Improved facial expression recognition method based on roi deep convolutional neutral network,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2017, pp. 256–261