Research Article
BibTex RIS Cite

LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION

Year 2020, Volume: 62 Issue: 1, 26 - 34, 30.06.2020
https://doi.org/10.33769/aupse.611958

Abstract

Semantic segmentation, which is one of the key problems in computer vision, has been applied in various application domains such as autonomous driving, robot navigation, or medical imagery, to name a few. Recently, deep learning, especially deep neural networks, have shown significant performance improvement over conventional semantic segmentation methods. In this paper, we present a novel encoder-decoder type deep neural network-based method, namely XSeNet, that can be trained end-to-end in a supervised manner. We adapt ResNet-50 layers as the encoder and design a cascaded decoder that composes of the stack of the X-Modules, which enables the network to learning dense contextual information and having wider field-of-view. We evaluate our method using CamVid dataset, and experimental results reveal that our method can segment most part of the scene accurately and even outperforms previous state-of-the art methods.

References

  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
  • M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.
  • K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.
  • J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
  • A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.
  • V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture forimage segmentation," arXiv preprint arXiv:1511.00561, 2015.
  • O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation," inInternational Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer,2015.
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Goingdeeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9,2015.
  • M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc)challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
  • C. Liu, J. Yuen, and A. Torralba, “Sift ow: Dense correspondence across scenes and its applications," IEEEtransactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978-994, 2011.
  • H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation," in Proceedings of theIEEE international conference on computer vision, pp. 1520-1528, 2015.
  • G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-defnition ground truthdatabase," Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
  • S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one hundred layers tiramisu: Fully convolutionaldensenets for semantic segmentation," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017IEEE Conference on, pp. 1175-1183, IEEE, 2017.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.," inCVPR, vol. 1, p. 3, 2017.
  • S. H. Raza, M. Grundmann, and I. Essa, “Geometric context from videos," in Computer Vision and Pattern Recog-nition (CVPR), 2013 IEEE Conference on, pp. 3081-3088, IEEE, 2013.
  • A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semanticsegmentation," arXiv preprint arXiv:1606.02147, 2016.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition," in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 770-778, 2016.
  • I. Ardiyanto and T. B. Adji, “Deep residual coalesced convolutional network for eficient semantic road segmentation,"IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 6, 2017.
  • F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, “Reseg: A recurrentneural network-based model for semantic segmentation," in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, pp. 41-48, 2016.
  • R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: fast semantic segmentation network," arXiv preprintarXiv:1902.04502, 2019.
  • F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122,2015.
  • L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmenta-tion," arXiv preprint arXiv:1706.05587, 2017.
  • L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis andmachine intelligence, vol. 40, no. 4, pp. 834-848, 2018.
  • L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolutionfor semantic image segmentation," arXiv preprint arXiv:1802.02611, 2018.
  • S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariateshift," arXiv preprint arXiv:1502.03167, 2015.
  • J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eficient object localization using convolutionalnetworks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656, 2015.
  • F. Yu, V. Koltun, and T. A. Funkhouser, “Dilated residual networks.," in CVPR, vol. 2, p. 3, 2017.
Year 2020, Volume: 62 Issue: 1, 26 - 34, 30.06.2020
https://doi.org/10.33769/aupse.611958

Abstract

References

  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition," Pro-ceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
  • M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks," in European conference oncomputer vision, pp. 818-833, Springer, 2014.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifcation with deep convolutional neural networks," inAdvances in neural information processing systems, pp. 1097-1105, 2012.
  • K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition," arXiv preprintarXiv:1409.1556, 2014.
  • J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
  • A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions," in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, 2015.
  • V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture forimage segmentation," arXiv preprint arXiv:1511.00561, 2015.
  • O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation," inInternational Conference on Medical image computing and computer-assisted intervention, pp. 234-241, Springer,2015.
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Goingdeeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9,2015.
  • M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc)challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
  • C. Liu, J. Yuen, and A. Torralba, “Sift ow: Dense correspondence across scenes and its applications," IEEEtransactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978-994, 2011.
  • H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation," in Proceedings of theIEEE international conference on computer vision, pp. 1520-1528, 2015.
  • G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-defnition ground truthdatabase," Pattern Recognition Letters, vol. 30, no. 2, pp. 88-97, 2009.
  • S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one hundred layers tiramisu: Fully convolutionaldensenets for semantic segmentation," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017IEEE Conference on, pp. 1175-1183, IEEE, 2017.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.," inCVPR, vol. 1, p. 3, 2017.
  • S. H. Raza, M. Grundmann, and I. Essa, “Geometric context from videos," in Computer Vision and Pattern Recog-nition (CVPR), 2013 IEEE Conference on, pp. 3081-3088, IEEE, 2013.
  • A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semanticsegmentation," arXiv preprint arXiv:1606.02147, 2016.
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition," in Proceedings of the IEEEconference on computer vision and pattern recognition, pp. 770-778, 2016.
  • I. Ardiyanto and T. B. Adji, “Deep residual coalesced convolutional network for eficient semantic road segmentation,"IPSJ Transactions on Computer Vision and Applications, vol. 9, no. 1, p. 6, 2017.
  • F. Visin, M. Ciccone, A. Romero, K. Kastner, K. Cho, Y. Bengio, M. Matteucci, and A. Courville, “Reseg: A recurrentneural network-based model for semantic segmentation," in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, pp. 41-48, 2016.
  • R. P. Poudel, S. Liwicki, and R. Cipolla, “Fast-scnn: fast semantic segmentation network," arXiv preprintarXiv:1902.04502, 2019.
  • F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122,2015.
  • L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmenta-tion," arXiv preprint arXiv:1706.05587, 2017.
  • L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis andmachine intelligence, vol. 40, no. 4, pp. 834-848, 2018.
  • L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolutionfor semantic image segmentation," arXiv preprint arXiv:1802.02611, 2018.
  • S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariateshift," arXiv preprint arXiv:1502.03167, 2015.
  • J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Eficient object localization using convolutionalnetworks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656, 2015.
  • F. Yu, V. Koltun, and T. A. Funkhouser, “Dilated residual networks.," in CVPR, vol. 2, p. 3, 2017.
There are 28 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Hacer Yalim Keles 0000-0002-1671-4126

Long Ang Lim This is me

Publication Date June 30, 2020
Submission Date August 27, 2019
Acceptance Date February 4, 2020
Published in Issue Year 2020 Volume: 62 Issue: 1

Cite

APA Yalim Keles, H., & Lim, L. A. (2020). LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, 62(1), 26-34. https://doi.org/10.33769/aupse.611958
AMA Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. June 2020;62(1):26-34. doi:10.33769/aupse.611958
Chicago Yalim Keles, Hacer, and Long Ang Lim. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62, no. 1 (June 2020): 26-34. https://doi.org/10.33769/aupse.611958.
EndNote Yalim Keles H, Lim LA (June 1, 2020) LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62 1 26–34.
IEEE H. Yalim Keles and L. A. Lim, “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”, Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng., vol. 62, no. 1, pp. 26–34, 2020, doi: 10.33769/aupse.611958.
ISNAD Yalim Keles, Hacer - Lim, Long Ang. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering 62/1 (June 2020), 26-34. https://doi.org/10.33769/aupse.611958.
JAMA Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020;62:26–34.
MLA Yalim Keles, Hacer and Long Ang Lim. “LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION”. Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 62, no. 1, 2020, pp. 26-34, doi:10.33769/aupse.611958.
Vancouver Yalim Keles H, Lim LA. LEARNING DENSE CONTEXTUAL FEATURES FOR SEMANTIC SEGMENTATION. Commun.Fac.Sci.Univ.Ank.Series A2-A3: Phys.Sci. and Eng. 2020;62(1):26-34.

Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.