LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION

Hacer Yalim Keles; Long Ang Lim

doi:10.33769/aupse.611958

Research Article

LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION

Year 2020, , 26 - 34, 30.06.2020

Hacer Yalim Keles , Long Ang Lim

https://doi.org/10.33769/aupse.611958

Abstract

Semantic segmentation, which is one of the key problems in computer vision, has been applied in various application domains such as autonomous driving, robot navigation, or medical imagery, to name a few. Recently, deep learning, especially deep neural networks, have shown significant performance improvement over conventional semantic segmentation methods. In this paper, we present a novel encoder-decoder type deep neural network-based method, namely XSeNet, that can be trained end-to-end in a supervised manner. We adapt ResNet-50 layers as the encoder and design a cascaded decoder that composes of the stack of the X-Modules, which enables the network to learning dense contextual information and having wider field-of-view. We evaluate our method using CamVid dataset, and experimental results reveal that our method can segment most part of the scene accurately and even outperforms previous state-of-the art methods.

Keywords

Semantic segmentation, deep learning, convolutional neural networks, pixel classification, autonomous driving

References

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.
V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture forimage segmentation," arXiv preprint arXiv:1511.00561, 2015.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation," inInternational Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer,2015.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Goingdeeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9,2015.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc)challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
C. Liu, J. Yuen, and A. Torralba, “Sift ow: Dense correspondence across scenes and its applications," IEEEtransactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978-994, 2011.
H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation," in Proceedings of theIEEE international conference on computer vision, pp. 1520-1528, 2015.
G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-defnition ground truthdatabase," Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one hundred layers tiramisu: Fully convolutionaldensenets for semantic segmentation," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017IEEE Conference on, pp. 1175-1183, IEEE, 2017.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.," inCVPR, vol. 1, p. 3, 2017.
S. H. Raza, M. Grundmann, and I. Essa, “Geometric context from videos," in Computer Vision and Pattern Recog-nition (CVPR), 2013 IEEE Conference on, pp. 3081-3088, IEEE, 2013.
A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semanticsegmentation," arXiv preprint arXiv:1606.02147, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition," in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 770-778, 2016.
I. Ardiyanto and T. B. Adji, “Deep residual coalesced convolutional network for eficient semantic road segmentation,"IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 6, 2017.
F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, “Reseg: A recurrentneural network-based model for semantic segmentation," in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, pp. 41-48, 2016.
R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: fast semantic segmentation network," arXiv preprintarXiv:1902.04502, 2019.
F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122,2015.
L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmenta-tion," arXiv preprint arXiv:1706.05587, 2017.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis andmachine intelligence, vol. 40, no. 4, pp. 834-848, 2018.
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolutionfor semantic image segmentation," arXiv preprint arXiv:1802.02611, 2018.
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariateshift," arXiv preprint arXiv:1502.03167, 2015.
J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eficient object localization using convolutionalnetworks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656, 2015.
F. Yu, V. Koltun, and T. A. Funkhouser, “Dilated residual networks.," in CVPR, vol. 2, p. 3, 2017.

Year 2020, , 26 - 34, 30.06.2020

Hacer Yalim Keles , Long Ang Lim

https://doi.org/10.33769/aupse.611958

Abstract

References

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.
V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture forimage segmentation," arXiv preprint arXiv:1511.00561, 2015.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation," inInternational Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer,2015.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Goingdeeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9,2015.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc)challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
C. Liu, J. Yuen, and A. Torralba, “Sift ow: Dense correspondence across scenes and its applications," IEEEtransactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978-994, 2011.
H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation," in Proceedings of theIEEE international conference on computer vision, pp. 1520-1528, 2015.
G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-defnition ground truthdatabase," Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one hundred layers tiramisu: Fully convolutionaldensenets for semantic segmentation," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017IEEE Conference on, pp. 1175-1183, IEEE, 2017.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.," inCVPR, vol. 1, p. 3, 2017.
S. H. Raza, M. Grundmann, and I. Essa, “Geometric context from videos," in Computer Vision and Pattern Recog-nition (CVPR), 2013 IEEE Conference on, pp. 3081-3088, IEEE, 2013.
A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semanticsegmentation," arXiv preprint arXiv:1606.02147, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition," in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 770-778, 2016.
I. Ardiyanto and T. B. Adji, “Deep residual coalesced convolutional network for eficient semantic road segmentation,"IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 6, 2017.
F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, “Reseg: A recurrentneural network-based model for semantic segmentation," in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, pp. 41-48, 2016.
R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: fast semantic segmentation network," arXiv preprintarXiv:1902.04502, 2019.
F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122,2015.
L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmenta-tion," arXiv preprint arXiv:1706.05587, 2017.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis andmachine intelligence, vol. 40, no. 4, pp. 834-848, 2018.
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolutionfor semantic image segmentation," arXiv preprint arXiv:1802.02611, 2018.
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariateshift," arXiv preprint arXiv:1502.03167, 2015.
J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eficient object localization using convolutionalnetworks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656, 2015.
F. Yu, V. Koltun, and T. A. Funkhouser, “Dilated residual networks.," in CVPR, vol. 2, p. 3, 2017.

There are 28 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Research Articles
Authors	Hacer Yalim Keles 0000-0002-1671-4126 Long Ang Lim This is me
Publication Date	June 30, 2020
Submission Date	August 27, 2019
Acceptance Date	February 4, 2020
Published in Issue	Year 2020

Cite

APA	Yalim Keles, H., & Lim, L. A. (2020). LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, 62(1), 26-34. https://doi.org/10.33769/aupse.611958
AMA	Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. June 2020;62(1):26-34. doi:10.33769/aupse.611958
Chicago	Yalim Keles, Hacer, and Long Ang Lim. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62, no. 1 (June 2020): 26-34. https://doi.org/10.33769/aupse.611958.
EndNote	Yalim Keles H, Lim LA (June 1, 2020) LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62 1 26–34.
IEEE	H. Yalim Keles and L. A. Lim, “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”, Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng., vol. 62, no. 1, pp. 26–34, 2020, doi: 10.33769/aupse.611958.
ISNAD	Yalim Keles, Hacer - Lim, Long Ang. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62/1 (June 2020), 26-34. https://doi.org/10.33769/aupse.611958.
JAMA	Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020;62:26–34.
MLA	Yalim Keles, Hacer and Long Ang Lim. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 62, no. 1, 2020, pp. 26-34, doi:10.33769/aupse.611958.
Vancouver	Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020;62(1):26-34.

Article Files

Full Text

Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering licensed under a Creative Commons Attribution 4.0 International License.