FashionCapsNet: Clothing Classification with Capsule Networks

Furkan Kınlı; Furkan Kıraç

doi:10.17671/gazibtd.580222

Research Article

FashionCapsNet: Kapsül Ağları ile Kıyafet Sınıflandırma

Year 2020, Volume: 13 Issue: 1, 87 - 96, 31.01.2020

Furkan Kınlı , Furkan Kıraç

https://doi.org/10.17671/gazibtd.580222

Cited By: 3

Abstract

Konvolüsyonel Sinir Ağları (KSA) görsel ilişkili derin öğrenme çalışmalarında en sık kullanılan mimarilerden biridir. Popülaritesine rağmen, KSA’lar ortaklama işlemi yüzünden konumsal bilgi kaybı ve afin dönüşümlerine dayanıklı olmama gibi bazı yerleşik sınırlamalara sahiptir. Öte yandan, gruplanmış nöronlardan oluşan Kapsül Ağları, özgün yönlendirme algoritmalarının yardımıyla, nesnenin yüksek boyutlu poz konfigürasyonunu da öğrenme kapasitesine sahiptir. Bu çalışmada dinamik yönlendirme algoritmasını kullanan Kapsül Ağları’nın kıyafet sınıflandırma performansını inceledik. Bu amaçla, arka arkaya yerleştirilmiş 4 Konvolüsyonel katmanlı bir Kapsül Ağ mimarisi (FashionCapsNet) önerdik, ve bu modeli 46 kategoriye ayrılmış 290 bin kıyafet resmi içeren DeepFashion adlı veri seti ile eğittik. Akabinde, modelimizin ve DeepFashion veri seti ile eğitilmiş CNN tabanlı en gelişmiş metotların kategori sınıflandırma sonuçlarını karşılaştırdık. Çalışmamızın sonucunda, FashionCapsNet, kıyafet sınıflandırma için %83,81’lik en yüksek-3 başarım oranı ve %89,83’lük en yüksek-5 başarım oranı sonuçlarını elde etmiştir. Bu rakamlara dayanarak, FashionCapsNet, poz konfigürasyonunu ihmal eden eski metotları açık bir şekilde geride bırakmıştır, ve poz konfigürasyonunu belirgin nokta bilgisinden faydalanarak telafi eden referans çalışmasıyla benzer bir performans göstermiştir. Son olarak, görece yeni olan Kapsül Ağları üzerine yapılacak araştırmalardaki gelişmeler sayesinde, önerdiğimiz bu modelin (FashionCapsNet) kıyafet sınıflandırma performansında ekstra bir artış gözlemlenebilir.

Keywords

derin öğrenme, kapsül ağları, kıyafet sınıflandırma, moda analizi

References

L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack, L. V. Gool, “Apparel Classification with Style”, Proceedings of the 11th Asian conference on Computer Vision (ACCV), 321-335, 2012.
B. Willimon, I. Walker, S. Birchfield, “Classification of Clothing Using Midlevel Layers”, ISRN Robotics, 2013.
Z. Liu, P. Luo, S. Qiu, X. Wang, X. Tang, “DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, T. L. Berg, “Where To Buy It: Matching Street Clothing Photos in Online Shops”, IEEE International Conference on Computer Vision (ICCV), 2015
S. Zheng, F. Yang, M. H. Kiapour, R. Piramuthu, “ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations”, Proceedings of ACM Multimedia conference (ACM Multimedia’18), 2018.
Z. Al-Halah, R. Stiefelhagen, K. Grauman, “Fashion Forward: Forecasting Visual Style in Fashion”, IEEE International Conference on Computer Vision (ICCV), 2017.
W. Di, C. Wah, A. Bhardwaj, R. Piramuthu, N. Sundaresan, “Style Finder: Fine-Grained Clothing Style Recognition and Retrieval”, IEEE International Workshop on Mobile Vision (IWMV), 2013.
N. Dalal, B. Triggs, “Histograms of Oriented Gradients for Human Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, 886–893, 2005.
D. G. Lowe, “Distinctive Image Features from Scale-invariant Key-points”, International Journal of Computer Vision (IJCV), 60(2), 91–110, 2004.
A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information Processing Systems (NIPS), 1106–1114, 2012.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., “ImageNet Large Scale Visual Recognition Challenge (ILSVRC)”, International Journal of Computer Vision (IJCV), 2015.
K. Zhao, X. Hu, J. Bu, C. Wang, “Deep Style Match for Complementary Recommendation”, Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, 2017.
Yu, H. Zhang, X. He, X. Chen, L. Xiong, Z. Qin, “Aesthetic-based Clothing Recommendation”, WWW, 649–658, 2018.
H. Tuinhof, C. Pirker, M. Haltmeier, “Image-based Fashion Product Recommendation with Deep Learning”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
W. Luo, Y. Li, R. Urtasun, R. Zemel, “Understanding the Effective Receptive Field in Deep Convolutional Neural Networks”, Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), 4905–4913, 2016.
S. Sabour, N. Frosst, G. E. Hinton, “Dynamic Routing between Capsules”, Neural Information Processing Systems (NIPS), 3859–3869, 2017.
J. Lafferty, A. McCallum, F. C. N. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, Proceedings of the 18th International Conference on Machine Learning (ICML), June 2001.
J. Huang, R. S. Feris, Q. Chen, S. Yan, “Cross-domain Image Retrieval with A Dual Attribute-aware Ranking Network”, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1062–1070, 2015.
M. Lin, Q. Chen, S. Yan, “Network in Network”, International Conference on Learning Representations (ICLR), 2014.
Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi, R. S. Feris, “Fully-adaptive Feature Sharing in Multi-task Networks with Applications in Person Attribute Classification”, A Computing Research Repository (CoRR), abs/1611.05377, 2016.
C. Corbiere, H. Ben-Younes, A. Rame, C. Ollion, “Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction”, IEEE International Conference on Computer Vision (ICCV) workshop, 2017.
W. Wang, Y. Xu, J. Shen, S. C. Zhu, “Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4271–4280, 2018.
J. Liu, H. Lu, “Deep Fashion Analysis with Feature Map Upsampling and Landmark-driven Attention”, IEEE European Conference on Computer Vision (ECCV) workshop, 2019.
K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778, 2016.
C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, “Inception-v4, Inception-ResNet and The Impact of Residual Connections on Learning”, AAAI, 4, 12, 2017.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., “Going Deeper With Convolutions”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9, 2015.
J. Redmon, S. K. Divvala, R. B. Girshick, A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788, 2016.
R. Girshick, “Fast R-CNN”, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 1440–1448, 2015.
S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks”, Proceedings of the Neural Information Processing Systems (NIPS), 91–99, 2015.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley et al., “Generative adversarial nets”, Neural Information Processing Systems (NIPS), 2014.
A. Radford, L. Metz, S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations (ICLR), 2016.
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford et al., “Improved Techniques for Training GANs”, Neural Information Processing Systems (NIPS), 2234–2242, 2016.
G. E. Hinton, A. Krizhevsky, S. D. Wang, “Transforming auto-encoders.”, International Conference on Artificial Neural Networks (ICANN), Springer, 44–51, 2011.
K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, arXiv preprint, abs/1409.1, 1–10, 2014.
A. L. Maas, A. Y. Hannun, A. Y. Ng. “Rectifier Nonlinearities Improve Neural Network Acoustic Models”, International Conference on Machine Learning (ICML), 30, 2013.
S. Ioffe, C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Proceedings of International Conference on Machine Learning (ICML), 37, 448–456, 2015.
D. P. Kingma, J. Ba, “ADAM: A Method for Stochastic Optimization”, 3rd International Conference for Learning Representations (ICLR), San Diego, 2014.
G. Hinton, S. Sabour, N. Frosst, “Matrix Capsules with EM Routing”, International Conference on Learning Representations (ICLR), 2018.
Y. Hu, X. Li, N. Zhou, L. Yang, L. Peng, S. Xiao, “A Sample Update-Based Convolutional Neural Network Framework for Object Detection in Large-Area Remote Sensing Images”, IEEE Geoscience and Remote Sensing Letters, 16(6), 947-951, 2019.
M. M. Ozguven, K. Adem, “Automatic detection and classification of leaf spot disease in sugar beet using deep learning algorithms”, Physica A: Statistical Mechanics and its Applications, 535, 122537, 2019.
Y. Wei, X. Liu, “Dangerous goods detection based on transfer learning in X-ray images”, Neural Computing and Applications, 1-14, 2019.

FashionCapsNet: Clothing Classification with Capsule Networks

Year 2020, Volume: 13 Issue: 1, 87 - 96, 31.01.2020

Furkan Kınlı , Furkan Kıraç

https://doi.org/10.17671/gazibtd.580222

Cited By: 3

Abstract

Convolutional Neural Networks (CNNs) are one of the most commonly used architectures for image-related deep learning studies. Despite its popularity, CNNs have some intrinsic limitations such as losing some of the spatial information and not being robust to affine transformations due to pooling operations. On the other hand, Capsule Networks are composed of groups of neurons, and with the help of its novel routing algorithms, they have the capability for learning high dimensional pose configuration of the objects as well. In this study, we investigate the performance of brand-new Capsule Networks using dynamic routing algorithm on the clothing classification task. To achieve this, we propose 4-layer stacked-convolutional Capsule Network architecture (FashionCapsNet), and train this model on DeepFashion dataset that contains 290k clothing images over 46 different categories. Thereafter, we compare the category classification results of our proposed design and the other state-of-the-art CNN-based methods trained on DeepFashion dataset. As a result of the experimental study, FashionCapsNet achieves 83.81% top-3 accuracy, and 89.83% top-5 accuracy on the clothing classification. Based upon these figures, FashionCapsNet clearly outperforms the earlier methods that neglect pose configuration, and has comparable performance to the baseline study that utilizes an additional landmark information to recover pose configuration. Finally, in the future, proposed FashionCapsNet may inherit extra performance boost on the clothing classification due to advances in the relatively new Capsule Network research.

Keywords

deep learning, capsule networks, clothing classification, fashion analysis

References

L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack, L. V. Gool, “Apparel Classification with Style”, Proceedings of the 11th Asian conference on Computer Vision (ACCV), 321-335, 2012.
B. Willimon, I. Walker, S. Birchfield, “Classification of Clothing Using Midlevel Layers”, ISRN Robotics, 2013.
Z. Liu, P. Luo, S. Qiu, X. Wang, X. Tang, “DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, T. L. Berg, “Where To Buy It: Matching Street Clothing Photos in Online Shops”, IEEE International Conference on Computer Vision (ICCV), 2015
S. Zheng, F. Yang, M. H. Kiapour, R. Piramuthu, “ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations”, Proceedings of ACM Multimedia conference (ACM Multimedia’18), 2018.
Z. Al-Halah, R. Stiefelhagen, K. Grauman, “Fashion Forward: Forecasting Visual Style in Fashion”, IEEE International Conference on Computer Vision (ICCV), 2017.
W. Di, C. Wah, A. Bhardwaj, R. Piramuthu, N. Sundaresan, “Style Finder: Fine-Grained Clothing Style Recognition and Retrieval”, IEEE International Workshop on Mobile Vision (IWMV), 2013.
N. Dalal, B. Triggs, “Histograms of Oriented Gradients for Human Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, 886–893, 2005.
D. G. Lowe, “Distinctive Image Features from Scale-invariant Key-points”, International Journal of Computer Vision (IJCV), 60(2), 91–110, 2004.
A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information Processing Systems (NIPS), 1106–1114, 2012.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., “ImageNet Large Scale Visual Recognition Challenge (ILSVRC)”, International Journal of Computer Vision (IJCV), 2015.
K. Zhao, X. Hu, J. Bu, C. Wang, “Deep Style Match for Complementary Recommendation”, Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, 2017.
Yu, H. Zhang, X. He, X. Chen, L. Xiong, Z. Qin, “Aesthetic-based Clothing Recommendation”, WWW, 649–658, 2018.
H. Tuinhof, C. Pirker, M. Haltmeier, “Image-based Fashion Product Recommendation with Deep Learning”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
W. Luo, Y. Li, R. Urtasun, R. Zemel, “Understanding the Effective Receptive Field in Deep Convolutional Neural Networks”, Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), 4905–4913, 2016.
S. Sabour, N. Frosst, G. E. Hinton, “Dynamic Routing between Capsules”, Neural Information Processing Systems (NIPS), 3859–3869, 2017.
J. Lafferty, A. McCallum, F. C. N. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, Proceedings of the 18th International Conference on Machine Learning (ICML), June 2001.
J. Huang, R. S. Feris, Q. Chen, S. Yan, “Cross-domain Image Retrieval with A Dual Attribute-aware Ranking Network”, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1062–1070, 2015.
M. Lin, Q. Chen, S. Yan, “Network in Network”, International Conference on Learning Representations (ICLR), 2014.
Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi, R. S. Feris, “Fully-adaptive Feature Sharing in Multi-task Networks with Applications in Person Attribute Classification”, A Computing Research Repository (CoRR), abs/1611.05377, 2016.
C. Corbiere, H. Ben-Younes, A. Rame, C. Ollion, “Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction”, IEEE International Conference on Computer Vision (ICCV) workshop, 2017.
W. Wang, Y. Xu, J. Shen, S. C. Zhu, “Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4271–4280, 2018.
J. Liu, H. Lu, “Deep Fashion Analysis with Feature Map Upsampling and Landmark-driven Attention”, IEEE European Conference on Computer Vision (ECCV) workshop, 2019.
K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778, 2016.
C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, “Inception-v4, Inception-ResNet and The Impact of Residual Connections on Learning”, AAAI, 4, 12, 2017.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., “Going Deeper With Convolutions”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9, 2015.
J. Redmon, S. K. Divvala, R. B. Girshick, A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788, 2016.
R. Girshick, “Fast R-CNN”, Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 1440–1448, 2015.
S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks”, Proceedings of the Neural Information Processing Systems (NIPS), 91–99, 2015.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley et al., “Generative adversarial nets”, Neural Information Processing Systems (NIPS), 2014.
A. Radford, L. Metz, S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations (ICLR), 2016.
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford et al., “Improved Techniques for Training GANs”, Neural Information Processing Systems (NIPS), 2234–2242, 2016.
G. E. Hinton, A. Krizhevsky, S. D. Wang, “Transforming auto-encoders.”, International Conference on Artificial Neural Networks (ICANN), Springer, 44–51, 2011.
K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, arXiv preprint, abs/1409.1, 1–10, 2014.
A. L. Maas, A. Y. Hannun, A. Y. Ng. “Rectifier Nonlinearities Improve Neural Network Acoustic Models”, International Conference on Machine Learning (ICML), 30, 2013.
S. Ioffe, C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Proceedings of International Conference on Machine Learning (ICML), 37, 448–456, 2015.
D. P. Kingma, J. Ba, “ADAM: A Method for Stochastic Optimization”, 3rd International Conference for Learning Representations (ICLR), San Diego, 2014.
G. Hinton, S. Sabour, N. Frosst, “Matrix Capsules with EM Routing”, International Conference on Learning Representations (ICLR), 2018.
Y. Hu, X. Li, N. Zhou, L. Yang, L. Peng, S. Xiao, “A Sample Update-Based Convolutional Neural Network Framework for Object Detection in Large-Area Remote Sensing Images”, IEEE Geoscience and Remote Sensing Letters, 16(6), 947-951, 2019.
M. M. Ozguven, K. Adem, “Automatic detection and classification of leaf spot disease in sugar beet using deep learning algorithms”, Physica A: Statistical Mechanics and its Applications, 535, 122537, 2019.
Y. Wei, X. Liu, “Dangerous goods detection based on transfer learning in X-ray images”, Neural Computing and Applications, 1-14, 2019.

There are 41 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Articles
Authors	Furkan Kınlı 0000-0002-9192-6583 Furkan Kıraç 0000-0001-9177-0489
Publication Date	January 31, 2020
Submission Date	June 20, 2019
Published in Issue	Year 2020 Volume: 13 Issue: 1

Cite

APA	Kınlı, F., & Kıraç, F. (2020). FashionCapsNet: Clothing Classification with Capsule Networks. Bilişim Teknolojileri Dergisi, 13(1), 87-96. https://doi.org/10.17671/gazibtd.580222

Cited By

Classification of Real and Fake Face Data Using Capsule Networks

Fırat Üniversitesi Mühendislik Bilimleri Dergisi

https://doi.org/10.35234/fumbd.1219227

Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement

Neural Computing and Applications

https://doi.org/10.1007/s00521-022-07900-3

Kapsül Ağları ile Yüz Verilerinin Sınıflandırılması

European Journal of Science and Technology

https://doi.org/10.31590/ejosat.999055

Download Cover Image

Article Files

Full Text