CLASSIFICATION OF ART PAINTINGS USING VISION TRANSFORMERS

Nergiz İnal; Serdar Çiftçi

doi:10.71074/CTC.1595868

Research Article

CLASSIFICATION OF ART PAINTINGS USING VISION TRANSFORMERS

Year 2025, Volume: 3 Issue: 1, 1 - 16, 27.06.2025

Nergiz İnal , Serdar Çiftçi

https://doi.org/10.71074/CTC.1595868

Abstract

In this study, we investigated the performance of art painting classification based on the categories of ``Genre", ``Artist", and ``Style" using deep learning methods. We employed convolutional neural network (CNN)-based models such as ResNet, MobileNet, EfficientNet, and ConvNeXt for their effectiveness in feature extraction. Additionally, we used vision transformer models, including ViT, Swin, BEiT, and DeiT, which used attention mechanisms. We conducted our experiments on the publicly available WikiArt dataset, and the BEiT model achieved the highest classification accuracy in the Artist and Genre categories, with results of 84.90% and 79.52%, respectively. In Style category, the Swin model produced the best result with an accuracy of 72.59%. In general, our findings indicate that transformer-based methods outperformed CNN-based methods. Furthermore, we compared our results with similar studies in the literature and showed that transformer-based models generally perform better in classifying art paintings.

Keywords

Painting Classifications , Vision Transformers , Transfer Learning

References

M. A. O¨ zdal, Resim ic¸erig˘i sınıflandırmasında yapay zekanın rolu¨, D-Sanat 1 (9) (2025) 56–71.
B. Saleh, A. Elgammal, Large-scale classification of fine-art paintings: Learning the right metric on the right feature, arXiv preprint arXiv:1505.00855 (2015).
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.
H. Bao, L. Dong, S. Piao, F.Wei, Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254 (2021).
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y.Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. J´egou, Training data-efficient image transformers & distillation through attention, in: International conference on machine learning, PMLR, 2021, pp. 10347–10357.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko,W.Wang, T.Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861 (2017).
M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in: International conference on machine learning, PMLR, 2019, pp. 6105–6114.
Z. Liu, H. Mao, C.-Y.Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976–11986.
W. R. Tan, C. S. Chan, H. Aguirre, K. Tanaka, Improved artgan for conditional synthesis of natural image and artwork, IEEE Transactions on Image Processing 28 (1) (2019) 394–409. doi:10.1109/TIP.2018.2866698. URL https://doi.org/10.1109/TIP.2018.2866698
G. Carneiro, N. P. Da Silva, A. Del Bue, J. P. Costeira, Artistic image classification: An analysis on the printart database, in: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part IV 12, Springer, 2012, pp. 143–157.
A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images.(2009) (2009).
M. D. Choudhury, Automated identification of painters over wikiart image data using machine learning algorithms, Ph.D. thesis, Dublin, National College of Ireland (2020).
C. Sandoval, E. Pirogova, M. Lech, Adversarial learning approach to unsupervised labeling of fine art paintings, IEEE Access 9 (2021) 81969–81985.
S. Agarwal, H. Karnick, N. Pant, U. Patel, Genre and style based painting classification, in: 2015 IEEE Winter Conference on Applications of Computer Vision, IEEE, 2015, pp. 588–594.
W. R. Tan, C. S. Chan, H. E. Aguirre, K. Tanaka, Ceci n’est pas une pipe: A deep convolutional network for fine-art paintings classification, in: 2016 IEEE international conference on image processing (ICIP), IEEE, 2016, pp. 3703–3707.
K. A. Jangtjik, T.-T. Ho, M.-C. Yeh, K.-L. Hua, A cnn-lstm framework for authorship classification of paintings, in: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 2866–2870.
A. Lecoutre, B. Negrevergne, F. Yger, Recognizing art style automatically in painting with deep learning, in: Asian conference on machine learning, PMLR, 2017, pp. 327–342.
E. Cetinic, T. Lipic, S. Grgic, Fine-tuning convolutional neural networks for fine art classification, Expert Systems with Applications 114 (2018) 107–118.
W.-T. Chu, Y.-L. Wu, Image style classification based on learnt deep correlation features, IEEE Transactions on Multimedia 20 (9) (2018) 2491–2502.
S. T. Krishna, H. K. Kalluri, Deep learning and transfer learning approaches for image classification, International Journal of Recent Technology and Engineering (IJRTE) 7 (5S4) (2019) 427–432.
M. V. Conde, K. Turgutlu, Clip-art: Contrastive pre-training for fine-grained art classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3956–3960.
K. Lu, Y. Xu, Y. Yang, Comparison of the potential between transformer and cnn in image classification, in: ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, VDE, 2021, pp. 1–6.
L. A. Iliadis, S. Nikolaidis, P. Sarigiannidis, S. Wan, S. K. Goudos, Artwork style recognition using vision transformers and mlp mixer, Technologies 10 (1) (2021) 2.
S. Diem, T. Mandl, Automatic classification of portraits: Application of transformer and cnn based models for an art historic dataset., in: LWDA, 2023, pp. 192–206.
L. Schaerf, E. Postma, C. Popovici, Art authentication with vision transformers, Neural Computing and Applications 36 (20) (2024) 11849–11858.
Q. Yu, C. Shi, An image classification approach for painting using improved convolutional neural algorithm, Soft Computing 28 (1) (2024) 847–873.

Year 2025, Volume: 3 Issue: 1, 1 - 16, 27.06.2025

Nergiz İnal , Serdar Çiftçi

https://doi.org/10.71074/CTC.1595868

Abstract

References

M. A. O¨ zdal, Resim ic¸erig˘i sınıflandırmasında yapay zekanın rolu¨, D-Sanat 1 (9) (2025) 56–71.
B. Saleh, A. Elgammal, Large-scale classification of fine-art paintings: Learning the right metric on the right feature, arXiv preprint arXiv:1505.00855 (2015).
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.
H. Bao, L. Dong, S. Piao, F.Wei, Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254 (2021).
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y.Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. J´egou, Training data-efficient image transformers & distillation through attention, in: International conference on machine learning, PMLR, 2021, pp. 10347–10357.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko,W.Wang, T.Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861 (2017).
M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in: International conference on machine learning, PMLR, 2019, pp. 6105–6114.
Z. Liu, H. Mao, C.-Y.Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976–11986.
W. R. Tan, C. S. Chan, H. Aguirre, K. Tanaka, Improved artgan for conditional synthesis of natural image and artwork, IEEE Transactions on Image Processing 28 (1) (2019) 394–409. doi:10.1109/TIP.2018.2866698. URL https://doi.org/10.1109/TIP.2018.2866698
G. Carneiro, N. P. Da Silva, A. Del Bue, J. P. Costeira, Artistic image classification: An analysis on the printart database, in: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part IV 12, Springer, 2012, pp. 143–157.
A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images.(2009) (2009).
M. D. Choudhury, Automated identification of painters over wikiart image data using machine learning algorithms, Ph.D. thesis, Dublin, National College of Ireland (2020).
C. Sandoval, E. Pirogova, M. Lech, Adversarial learning approach to unsupervised labeling of fine art paintings, IEEE Access 9 (2021) 81969–81985.
S. Agarwal, H. Karnick, N. Pant, U. Patel, Genre and style based painting classification, in: 2015 IEEE Winter Conference on Applications of Computer Vision, IEEE, 2015, pp. 588–594.
W. R. Tan, C. S. Chan, H. E. Aguirre, K. Tanaka, Ceci n’est pas une pipe: A deep convolutional network for fine-art paintings classification, in: 2016 IEEE international conference on image processing (ICIP), IEEE, 2016, pp. 3703–3707.
K. A. Jangtjik, T.-T. Ho, M.-C. Yeh, K.-L. Hua, A cnn-lstm framework for authorship classification of paintings, in: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 2866–2870.
A. Lecoutre, B. Negrevergne, F. Yger, Recognizing art style automatically in painting with deep learning, in: Asian conference on machine learning, PMLR, 2017, pp. 327–342.
E. Cetinic, T. Lipic, S. Grgic, Fine-tuning convolutional neural networks for fine art classification, Expert Systems with Applications 114 (2018) 107–118.
W.-T. Chu, Y.-L. Wu, Image style classification based on learnt deep correlation features, IEEE Transactions on Multimedia 20 (9) (2018) 2491–2502.
S. T. Krishna, H. K. Kalluri, Deep learning and transfer learning approaches for image classification, International Journal of Recent Technology and Engineering (IJRTE) 7 (5S4) (2019) 427–432.
M. V. Conde, K. Turgutlu, Clip-art: Contrastive pre-training for fine-grained art classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3956–3960.
K. Lu, Y. Xu, Y. Yang, Comparison of the potential between transformer and cnn in image classification, in: ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, VDE, 2021, pp. 1–6.
L. A. Iliadis, S. Nikolaidis, P. Sarigiannidis, S. Wan, S. K. Goudos, Artwork style recognition using vision transformers and mlp mixer, Technologies 10 (1) (2021) 2.
S. Diem, T. Mandl, Automatic classification of portraits: Application of transformer and cnn based models for an art historic dataset., in: LWDA, 2023, pp. 192–206.
L. Schaerf, E. Postma, C. Popovici, Art authentication with vision transformers, Neural Computing and Applications 36 (20) (2024) 11849–11858.
Q. Yu, C. Shi, An image classification approach for painting using improved convolutional neural algorithm, Soft Computing 28 (1) (2024) 847–873.

There are 30 citations in total.

Details

Primary Language	English
Subjects	Computer Vision
Journal Section	Research Article
Authors	Nergiz İnal 0009-0001-7340-8811 Serdar Çiftçi 0000-0001-7074-2876
Early Pub Date	June 9, 2025
Publication Date	June 27, 2025
Submission Date	December 3, 2024
Acceptance Date	December 26, 2024
Published in Issue	Year 2025 Volume: 3 Issue: 1

Cite

Download Cover Image

Article Files

Full Text