<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                    <journal-id></journal-id>
            <journal-title-group>
                                                                                    <journal-title>Balkan Journal of Electrical and Computer Engineering</journal-title>
            </journal-title-group>
                            <issn pub-type="ppub">2147-284X</issn>
                                        <issn pub-type="epub">2147-284X</issn>
                                                                                            <publisher>
                    <publisher-name>MUSA YILMAZ</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.17694/bajece.1345993</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Computer Software</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Bilgisayar Yazılımı</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                        <article-title>Vision Transformer Based Photo Capturing System</article-title>
                                                                                                                                        </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0002-0738-871X</contrib-id>
                                                                <name>
                                    <surname>Albayrak</surname>
                                    <given-names>Abdülkadir</given-names>
                                </name>
                                                                    <aff>DICLE UNIVERSITY</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20231222">
                    <day>12</day>
                    <month>22</month>
                    <year>2023</year>
                </pub-date>
                                        <volume>11</volume>
                                        <issue>4</issue>
                                        <fpage>316</fpage>
                                        <lpage>321</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20230818">
                        <day>08</day>
                        <month>18</month>
                        <year>2023</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20231222">
                        <day>12</day>
                        <month>22</month>
                        <year>2023</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2013, Balkan Journal of Electrical and Computer Engineering</copyright-statement>
                    <copyright-year>2013</copyright-year>
                    <copyright-holder>Balkan Journal of Electrical and Computer Engineering</copyright-holder>
                </permissions>
            
                                                                                                <abstract><p>Portrait photo is one of the most crucial documents that many people need for official transactions in many public and private organizations. Despite the developing technologies and high resolution imaging devices, people need such photographer offices to fulfil their needs to take photos. In this study, a Photo Capturing System has been developed to provide infrastructure for web and mobile applications. After the system detects the person&#039;s face, facial orientation and facial expression, it automatically takes a photo and sends it to a graphical user interface developed for this purpose. Then, with the help of the user interface of the photo taken by the system, it is automatically printed out. The proposed study is a unique study that uses imaging technologies, deep learning and vision transformer algorithms, which are very popular image processing techniques in several years. Within the scope of the study, face detection and facial expression recognition are performed with a success rate of close to 100\% and 95.52\%, respectively. In the study, the performances of Vision Transformer algorithm is also compared with the state of art algorithms in facial expression recognition.</p></abstract>
                                                                                    
            
                                                            <kwd-group>
                                                    <kwd>Vision transformer</kwd>
                                                    <kwd>  facial expression recognition</kwd>
                                                    <kwd>  photo capturing system</kwd>
                                                    <kwd>  single shot detection</kwd>
                                                    <kwd>  deep learning</kwd>
                                            </kwd-group>
                                                        
                                                                                                                                                    </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">[1]	W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">[2]	S. Ren, X. Cao, Y. Wei, and J. Sun, “Face alignment at 3000 fps via regressing local binary features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1685–1692.</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">[3]	O. Déniz, G. Bueno, J. Salido, and F. De la Torre, “Face recognition using histograms of oriented gradients,” Pattern recognition letters, vol. 32, no. 12, pp. 1598–1603, 2011.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">[4]	I. M. Revina and W. S. Emmanuel, “A survey on human face expression recognition techniques,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 619–628, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">[5]	H. Li, M. Sui, F. Zhao, Z. Zha, and F. Wu, “Mvt: Mask vision transformer for facial expression recognition in the wild,” arXiv preprint arXiv:2106.04520, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">[6]	S. M. González-Lozoya, J. de la Calleja, L. Pellegrin, H. J. Escalante, M. Medina, A. Benitez-Ruiz et al., “Recognition of facial expressions based on cnn features,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 13 987–14 007, 2020.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">[7]	D. O. Melinte and L. Vladareanu, “Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer,” Sensors, vol. 20, no. 8, p. 2393, 2020.</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">[8]	S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM Computing Surveys (CSUR), 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">[9]	M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang, “Intriguing properties of vision transformers,” Advances in Neural Information Processing Systems, vol. 34, 2021.</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">[10]	P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM international conference on Multimedia, 2007, pp. 357–360.</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">[11]	G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, pp. 915–928, 2007.</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">[12]	Z. Wang, S. Wang, and Q. Ji, “Capturing complex spatio-temporal relations among facial muscles for facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3422–3429.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">[13]	G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett, “The computer expression recognition toolbox (cert),” in 2011 IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG). IEEE, 2011, pp. 298–305.</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">[14]	 S. W. Chew, S. Lucey, P. Lucey, S. Sridharan, and J. F. Conn, “Improved facial expression recognition via uni-hyperplane classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 2554–2561.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">[15]	R. Ptucha, G. Tsagkatakis, and A. Savakis, “Manifold based sparse representation for robust expression recognition without neutral subtraction,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011, pp. 2136–2143.</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">[16]	S. Jain, C. Hu, and J. K. Aggarwal, “Facial expression recognition with temporal modeling of shapes,” in 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, 2011, pp. 1642–1649.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">[17]	 M. Liu, S. Shan, R. Wang, and X. Chen, “Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1749–1756.</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">[18]	 M. Liu, S. Li, S. Shan, and X. Chen, “Au-inspired deep networks for facial expression feature learning,” Neurocomputing, vol. 159, pp. 126–136, 2015.</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">[19]	 P. Liu, S. Han, Z. Meng, and Y. Tong, “Facial expression recognition via a boosted deep belief network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1805–1812.</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">[20]	 X. Sun, M. Lv, C. Quan, and F. Ren, “Improved facial expression recognition method based on roi deep convolutional neutral network,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2017, pp. 256–261</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
