LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION

Hacer Yalim Keles; Long Ang Lim

doi:10.33769/aupse.611958

EN

LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION

Abstract

Semantic segmentation, which is one of the key problems in computer vision, has been applied in various application domains such as autonomous driving, robot navigation, or medical imagery, to name a few. Recently, deep learning, especially deep neural networks, have shown significant performance improvement over conventional semantic segmentation methods. In this paper, we present a novel encoder-decoder type deep neural network-based method, namely XSeNet, that can be trained end-to-end in a supervised manner. We adapt ResNet-50 layers as the encoder and design a cascaded decoder that composes of the stack of the X-Modules, which enables the network to learning dense contextual information and having wider field-of-view. We evaluate our method using CamVid dataset, and experimental results reveal that our method can segment most part of the scene accurately and even outperforms previous state-of-the art methods.

Keywords

Semantic segmentation,deep learning,convolutional neural networks,pixel classification,autonomous driving

References

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.
V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture forimage segmentation," arXiv preprint arXiv:1511.00561, 2015.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation," inInternational Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer,2015.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Goingdeeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9,2015.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc)challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
C. Liu, J. Yuen, and A. Torralba, “Sift ow: Dense correspondence across scenes and its applications," IEEEtransactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978-994, 2011.
H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation," in Proceedings of theIEEE international conference on computer vision, pp. 1520-1528, 2015.
G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-defnition ground truthdatabase," Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one hundred layers tiramisu: Fully convolutionaldensenets for semantic segmentation," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017IEEE Conference on, pp. 1175-1183, IEEE, 2017.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.," inCVPR, vol. 1, p. 3, 2017.
S. H. Raza, M. Grundmann, and I. Essa, “Geometric context from videos," in Computer Vision and Pattern Recog-nition (CVPR), 2013 IEEE Conference on, pp. 3081-3088, IEEE, 2013.
A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semanticsegmentation," arXiv preprint arXiv:1606.02147, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition," in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 770-778, 2016.
I. Ardiyanto and T. B. Adji, “Deep residual coalesced convolutional network for eficient semantic road segmentation,"IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 6, 2017.
F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, “Reseg: A recurrentneural network-based model for semantic segmentation," in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, pp. 41-48, 2016.
R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: fast semantic segmentation network," arXiv preprintarXiv:1902.04502, 2019.
F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122,2015.
L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmenta-tion," arXiv preprint arXiv:1706.05587, 2017.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis andmachine intelligence, vol. 40, no. 4, pp. 834-848, 2018.
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolutionfor semantic image segmentation," arXiv preprint arXiv:1802.02611, 2018.
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariateshift," arXiv preprint arXiv:1502.03167, 2015.
J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eficient object localization using convolutionalnetworks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656, 2015.
F. Yu, V. Koltun, and T. A. Funkhouser, “Dilated residual networks.," in CVPR, vol. 2, p. 3, 2017.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Authors

Hacer Yalim Keles ^*
0000-0002-1671-4126
Türkiye

Long Ang Lim This is me
Türkiye

Publication Date

June 30, 2020

Submission Date

August 27, 2019

Acceptance Date

February 4, 2020

Published in Issue

Year 2020 Volume: 62 Number: 1

DOI

https://doi.org/10.33769/aupse.611958

IZ

https://izlik.org/JA24CJ89HH

Cite

RIS / Bibtex

APA

Yalim Keles, H., & Lim, L. A. (2020). LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, 62(1), 26-34. https://doi.org/10.33769/aupse.611958

AMA

1.Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020;62(1):26-34. doi:10.33769/aupse.611958

Chicago

Yalim Keles, Hacer, and Long Ang Lim. 2020. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62 (1): 26-34. https://doi.org/10.33769/aupse.611958.

EndNote

Yalim Keles H, Lim LA (June 1, 2020) LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62 1 26–34.

IEEE

[1]H. Yalim Keles and L. A. Lim, “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”, Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng., vol. 62, no. 1, pp. 26–34, June 2020, doi: 10.33769/aupse.611958.

ISNAD

Yalim Keles, Hacer - Lim, Long Ang. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62/1 (June 1, 2020): 26-34. https://doi.org/10.33769/aupse.611958.

JAMA

1.Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020;62:26–34.

MLA

Yalim Keles, Hacer, and Long Ang Lim. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 62, no. 1, June 2020, pp. 26-34, doi:10.33769/aupse.611958.

Vancouver

1.Hacer Yalim Keles, Long Ang Lim. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020 Jun. 1;62(1):26-34. doi:10.33769/aupse.611958