In this study, we investigated the performance of art painting classification based on the categories of ``Genre", ``Artist", and ``Style" using deep learning methods. We employed convolutional neural network (CNN)-based models such as ResNet, MobileNet, EfficientNet, and ConvNeXt for their effectiveness in feature extraction. Additionally, we used vision transformer models, including ViT, Swin, BEiT, and DeiT, which used attention mechanisms. We conducted our experiments on the publicly available WikiArt dataset, and the BEiT model achieved the highest classification accuracy in the Artist and Genre categories, with results of 84.90% and 79.52%, respectively. In Style category, the Swin model produced the best result with an accuracy of 72.59%. In general, our findings indicate that transformer-based methods outperformed CNN-based methods. Furthermore, we compared our results with similar studies in the literature and showed that transformer-based models generally perform better in classifying art paintings.
Primary Language | English |
---|---|
Subjects | Computer Vision |
Journal Section | Research Article |
Authors | |
Early Pub Date | June 9, 2025 |
Publication Date | June 27, 2025 |
Submission Date | December 3, 2024 |
Acceptance Date | December 26, 2024 |
Published in Issue | Year 2025 Volume: 3 Issue: 1 |