Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network

Tolga Inan; Umit Kacar

doi:10.17694/bajece.1024073

EN

Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network

Abstract

Semantic segmentation is a fundamental problem for computer vision. On the other hand, for studies in the field of biometrics, semantic segmentation is gaining more importance. Many successful biometric recognition systems require a high- performance semantic segmentation algorithm. In this study, we present an effective ear segmentation technique in natural images. A convolutional neural network is trained for pixel-based ear segmentation. DeepLab v3+ network structure, with ResNet-18 as the backbone and Tversky lost function layer as the last layer, has been trained with natural and uncontrolled images. We perform the proposed network training using only the 750 images in the Annotated Web Ears (AWE) training set. The corresponding tests are performed on the AWE Test Set, University of Ljubljana Test Set, and the Collection A of In-The-Wild dataset. For the Annotated Web Ears (AWE) dataset, intersection over union (IoU) is measured as 86.3% for the AWE database. To the best of our knowledge, this is the highest performance achieved among the algorithms tested on the AWE test set.

Keywords

References

[1] A. Abaza, A. Ross, C. Hebert, M. A. F. Harrison, and M. S.Nixon, “A survey on ear biometrics,” ACM Computing Surveys,vol. 45, no. 2, pp. 1–35, Feb. 2013, number: 2 Reporter: ACMComputing Surveys. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2431211.2431221
[2] A. Pflug and C. Busch, “Ear biometrics: a survey of detection,feature extraction and recognition methods,” IET Biometrics, vol. 1,no. 2, pp. 114–129, Jun. 2012, number: 2 Reporter: IET Biometrics.[Online]. Available: https://digital-library.theiet.org/content/journals/10.1049/iet-bmt.2011.0003
[3] Z. Emersic, D. Stepec, V. Struc, P. Peer, A. George, A. Ahmad, E. Omar,T. E. Boult, R. Safdaii, Y. Zhou, S. Zafeiriou, D. Yaman, F. I. Eyiokur,and H. K. Ekenel, “The unconstrained ear recognition challenge,” in2017 IEEE International Joint Conference on Biometrics (IJCB), Oct.2017, pp. 715–724, meeting Name: 2017 IEEE International JointConference on Biometrics (IJCB) Reporter: 2017 IEEE InternationalJoint Conference on Biometrics (IJCB) ISSN: 2474-9699.
[4] Z. Emersic, A. K. S. V, B. S. Harish, W. Gutfeter, J. N. Khiarak,A. Pacut, E. Hansley, M. P. Segundo, S. Sarkar, H. J. Park, G. P. Nam, I.-J. Kim, S. G. Sangodkar, U. Kacar, M. Kirci, L. Yuan, J. Yuan, H. Zhao,F. Lu, J. Mao, X. Zhang, D. Yaman, F. I. Eyiokur, K. B. Özler, H. K.Ekenel, D. P. Chowdhury, S. Bakshi, P. K. Sa, B. Majhi, P. Peer, andV. Štruc, “The Unconstrained Ear Recognition Challenge 2019,” in 2019International Conference on Biometrics (ICB), 2019, pp. 1–15.
[5] Z. Emersic, J. Krizaj, V. Struc, and P. Peer, “Deep Ear RecognitionPipeline,” in Recent Advances in Computer Vision: Theories andApplications, ser. Studies in Computational Intelligence, M. Hassaballahand K. M. Hosny, Eds. Cham: Springer International Publishing,2019, pp. 333–362, reporter: Recent Advances in Computer Vision:Theories and Applications. [Online]. Available: https://doi.org/10.1007/978-3-030-03000-1 14
[6] Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object Detection in 20Years: A Survey,” arXiv:1905.05055 [cs], May 2019, reporter:arXiv:1905.05055 [cs] arXiv: 1905.05055. [Online]. Available: http://arxiv.org/abs/1905.05055
[7] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A DeepNeural Network Architecture for Real-Time Semantic Segmentation,”arXiv:1606.02147 [cs], Jun. 2016, reporter: arXiv:1606.02147 [cs]arXiv: 1606.02147. [Online]. Available: http://arxiv.org/abs/1606.02147
[8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L.Yuille, “Semantic Image Segmentation with Deep ConvolutionalNets and Fully Connected CRFs,” arXiv:1412.7062 [cs], Jun. 2016,reporter: arXiv:1412.7062 [cs] arXiv: 1412.7062. [Online]. Available:http://arxiv.org/abs/1412.7062

[9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,and A. Zisserman, The PASCAL Visual Object Classes Challenge2012 (VOC2012) Results, 2012. [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
[10] G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-pathRefinement Networks for High-Resolution Semantic Segmentation,” in2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu, HI: IEEE, Jul. 2017, pp. 5168–5177, meetingName: 2017 IEEE Conference on Computer Vision and PatternRecognition (CVPR)
[11] Y. Xian, S. Choudhury, Y. He, B. Schiele, and Z. Akata, “SemanticProjection Network for Zero- and Few-Label Semantic Segmentation,”in 2019 IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR). Long Beach, CA, USA: IEEE, Jun. 2019, pp.8248–8257, meeting Name: 2019 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR)
[12] Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Zero-ShotLearning - A Comprehensive Evaluation of the Good, the Bad and theUgly,” arXiv:1707.00600 [cs], Aug. 2018, arXiv: 1707.00600. [Online].Available: http://arxiv.org/abs/1707.00600
[13] H. Caesar, J. Uijlings, and V. Ferrari, “Coco-stuff: Thing and stuff classesin context,” in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2018, pp. 1209–1218.
[14] Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCNet:Criss-Cross Attention for Semantic Segmentation,” in 2019 IEEE/CVFInternational Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE, Oct. 2019, pp. 603–612, meeting Name: 2019 IEEE/CVFInternational Conference on Computer Vision (ICCV) Reporter: 2019IEEE/CVF International Conference on Computer Vision (ICCV).[Online]. Available: https://ieeexplore.ieee.org/document/9009011/
[15] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be-nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes datasetfor semantic urban scene understanding,” in Proceedings of the IEEEconference on computer vision and pattern recognition, 2016, pp. 3213–3223.
[16] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba,“Scene parsing through ade20k dataset,” in Proceedings of the IEEEconference on computer vision and pattern recognition, 2017, pp. 633–641.
[17] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam,“Rethinking Atrous Convolution for Semantic Image Segmentation,”arXiv:1706.05587 [cs], Dec. 2017, reporter: arXiv:1706.05587 [cs]arXiv: 1706.05587. [Online]. Available: http://arxiv.org/abs/1706.05587
[18] A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollar,“Panoptic Segmentation,” in 2019 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR). Long Beach, CA, USA:IEEE, Jun. 2019, pp. 9396–9405, meeting Name: 2019 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR)Reporter: 2019 IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR). [Online].
[19] C. Liu, L.-C. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille,and L. Fei-Fei, “Auto-DeepLab: Hierarchical Neural ArchitectureSearch for Semantic Image Segmentation,” in 2019 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR).Long Beach, CA, USA: IEEE, Jun. 2019, pp. 82–92
[20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:A large-scale hierarchical image database,” in 2009 IEEE conference oncomputer vision and pattern recognition. Ieee, 2009, pp. 248–255.
[21] S. Mittal, M. Tatarchenko, and T. Brox, “Semi-Supervised Semantic Seg-mentation with High- and Low-level Consistency,” IEEE Transactionson Pattern Analysis and Machine Intelligence, pp. 1–1, 2019, reporter:IEEE Transactions on Pattern Analysis and Machine Intelligence.[Online]. Available: https://ieeexplore.ieee.org/document/8935407/
[22] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez,P. Martinez-Gonzalez, and J. Garcia-Rodriguez, “A survey on deeplearning techniques for image and video semantic segmentation,”Applied Soft Computing, vol. 70, pp. 41–65, Sep. 2018, reporter:Applied Soft Computing. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1568494618302813
[23] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, andD. Terzopoulos, “Image Segmentation Using Deep Learning: A Survey,”arXiv:2001.05566 [cs], Jan. 2020, reporter: arXiv:2001.05566 [cs]arXiv: 2001.05566. [Online]. Available: http://arxiv.org/abs/2001.05566
[24] F. Lateef and Y. Ruichek, “Survey on semantic segmentationusing deep learning techniques,” Neurocomputing, vol. 338, pp.321–348, Apr. 2019, reporter: Neurocomputing. [Online]. Available:https://linkinghub.elsevier.com/retrieve/pii/S092523121930181X
[25] I. Ulku and E. Akagunduz, “A Survey on Deep Learning-based Architec-tures for Semantic Segmentation on 2D images,” IEEE TRANSACTIONSON KNOWLEDGE AND DATA ENGINEERING, p. 14, 2019, reporter:IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEER-ING.
[26] Z. Emersic, V. Struc, and P. Peer, “Ear recognition: More than asurvey,” Neurocomputing, vol. 255, pp. 26–39, Sep. 2017, reporter:Neurocomputing. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S092523121730543X
[27] M. Bizjak, P. Peer, and Z. Emersic, “Mask R-CNN for EarDetection,” in 2019 42nd International Convention on Information andCommunication Technology, Electronics and Microelectronics (MIPRO).Opatija, Croatia: IEEE, May 2019, pp. 1624–1628,
[28] R. Raposo, E. Hoyle, A. Peixinho, and H. Proença, “UBEAR: Adataset of ear images captured on-the-move in uncontrolled conditions,”2011 IEEE Workshop on Computational Intelligence in Biometrics andIdentity Management (CIBIM), pp. 84–90, 2011.
[29] Y. Zhou and S. Zaferiou, “Deformable Models of Ears in-the-Wildfor Alignment and Recognition,” in 2017 12th IEEE InternationalConference on Automatic Face Gesture Recognition (FG 2017), 2017,pp. 626–633.
[30] Z. Emersic, L. L. Gabriel, V. Struc, and P. Peer, “Convolutionalencoder–decoder networks for pixel-wise ear detection andsegmentation,” IET Biometrics, vol. 7, no. 3, pp. 175–184, May2018, number: 3 Reporter: IET Biometrics. [Online]. Available: https://digital-library.theiet.org/content/journals/10.1049/iet-bmt.2017.0240
[31] Z. Emersic, D. Susanj, B. Meden, P. Peer, and V. Struc, “Contexednet:Context–aware ear detection in unconstrained settings,” IEEE Access,vol. 9, pp. 145 175–145 190, 2021.
[32] C. Cintas, C. Delrieux, P. Navarro, M. Quinto-Sánchez, B. Pazos,and R. Gonzalez-José, “Automatic Ear Detection and Segmentationover Partially Occluded Profile Face Images,” Journal of ComputerScience and Technology, vol. 19, no. 01, p. e08, Apr. 2019, number:01 Reporter: Journal of Computer Science and Technology. [Online].Available: http://journal.info.unlp.edu.ar/JCST/article/view/1097
[33] X. Zhang, L. Yuan, and J. Huang, “Physiological Curves Extractionof Human Ear Based on Improved YOLACT,” in 2020 IEEE 2ndInternational Conference on Civil Aviation Safety and InformationTechnology (ICCASIT, 2020, pp. 390–394.
[34] I. I. Ganapathi, S. Prakash, I. R. Dave, and S. Bakshi, “Unconstrained eardetection using ensemble-based convolutional neural network model:Unconstrained ear detection using ensemble-based convolutionalneural network model,” Concurrency and Computation: Practiceand Experience, p. e5197, Feb. 2019
[35] Y. Zhang and Z. Mu, “Ear Detection under Uncontrolled Conditions withMultiple Scale Faster Region-Based Convolutional Neural Networks,”Symmetry, vol. 9, no. 4, p. 53, Apr. 2017, number: 4 Reporter:Symmetry. [Online]. Available: http://www.mdpi.com/2073-8994/9/4/53
[36] A. Kamboj, R. Rani, A. Nigam, and R. Jha, “CED-Net: context-awareear detection network for unconstrained images,” Pattern Analysis andApplications, 2020.
[37] W. Raveane, P. L. Galdámez, and M. A. González Arrieta, “EarDetection and Localization with Convolutional Neural Networksin Natural Images and Videos,” Processes, vol. 7, no. 7, p.457, Jul. 2019, number: 7 Reporter: Processes. [Online]. Available:https://www.mdpi.com/2227-9717/7/7/457
[38] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam,“Encoder-Decoder with Atrous Separable Convolution for SemanticImage Segmentation,” in Computer Vision – ECCV 2018, V. Ferrari,M. Hebert, C. Sminchisescu, and Y. Weiss, Eds. Cham: SpringerInternational Publishing, 2018, vol. 11211, pp. 833–851
[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning forImage Recognition,” in 2016 IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2016, pp. 770–778.
[40] S. S. M. Salehi, D. Erdogmus, and A. Gholipour, “Tversky loss functionfor image segmentation using 3D fully convolutional deep networks,”arXiv:1706.05721 [cs], Jun. 2017, reporter: arXiv:1706.05721 [cs]arXiv: 1706.05721. [Online]. Available: http://arxiv.org/abs/1706.05721
[41] D. P. Kingma and J. Ba, “Adam: A Method for StochasticOptimization,” arXiv:1412.6980 [cs], Jan. 2017, arXiv: 1412.6980.[Online]. Available: http://arxiv.org/abs/1412.6980
[42] U. Kacar and M. Kirci, “ScoreNet: Deep cascade score level fusionfor unconstrained ear recognition,” IET Biometrics, vol. 8, no. 2, pp.109–120, 2018, number: 2 Publisher: IET.

Details

Primary Language

English

Subjects

Artificial Intelligence

Journal Section

Research Article

Authors

Tolga Inan ^*
0000-0002-8612-122X
Türkiye

Umit Kacar This is me
0000-0003-1660-0775
Türkiye

Publication Date

July 30, 2022

Submission Date

November 15, 2021

Acceptance Date

July 18, 2022

Published in Issue

Year 2022 Volume: 10 Number: 3

DOI

https://doi.org/10.17694/bajece.1024073

IZ

https://izlik.org/JA43LW47FK

Cite

RIS / Bibtex

APA

Inan, T., & Kacar, U. (2022). Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network. Balkan Journal of Electrical and Computer Engineering, 10(3), 337-346. https://doi.org/10.17694/bajece.1024073

AMA

1.Inan T, Kacar U. Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network. Balkan Journal of Electrical and Computer Engineering. 2022;10(3):337-346. doi:10.17694/bajece.1024073

Chicago

Inan, Tolga, and Umit Kacar. 2022. “Ear Semantic Segmentation in Natural Images With Tversky Loss Function Supported DeepLabv3+ Convolutional Neural Network”. Balkan Journal of Electrical and Computer Engineering 10 (3): 337-46. https://doi.org/10.17694/bajece.1024073.

EndNote

Inan T, Kacar U (July 1, 2022) Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network. Balkan Journal of Electrical and Computer Engineering 10 3 337–346.

IEEE

[1]T. Inan and U. Kacar, “Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network”, Balkan Journal of Electrical and Computer Engineering, vol. 10, no. 3, pp. 337–346, July 2022, doi: 10.17694/bajece.1024073.

ISNAD

Inan, Tolga - Kacar, Umit. “Ear Semantic Segmentation in Natural Images With Tversky Loss Function Supported DeepLabv3+ Convolutional Neural Network”. Balkan Journal of Electrical and Computer Engineering 10/3 (July 1, 2022): 337-346. https://doi.org/10.17694/bajece.1024073.

JAMA

1.Inan T, Kacar U. Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network. Balkan Journal of Electrical and Computer Engineering. 2022;10:337–346.

MLA

Inan, Tolga, and Umit Kacar. “Ear Semantic Segmentation in Natural Images With Tversky Loss Function Supported DeepLabv3+ Convolutional Neural Network”. Balkan Journal of Electrical and Computer Engineering, vol. 10, no. 3, July 2022, pp. 337-46, doi:10.17694/bajece.1024073.

Vancouver

1.Tolga Inan, Umit Kacar. Ear semantic segmentation in natural images with Tversky loss function supported DeepLabv3+ convolutional neural network. Balkan Journal of Electrical and Computer Engineering. 2022 Jul. 1;10(3):337-46. doi:10.17694/bajece.1024073