Vision Transformer-Based Approach: A Novel Method for Object Recognition

Ali Khudhair Abbas Ali Ali; Yıldız Aydın

doi:10.31466/kfbd.1620640

EN TR

Vision Transformer-Based Approach: A Novel Method for Object Recognition

Abstract

This paper proposes a hybrid method to improve object recognition applications on inefficient and imbalanced datasets. The proposed method aims to enhance object recognition performance using the Vision Transformer (ViT) deep learning model and various classical machine learning classifiers (LightGBM, AdaBoost, ExtraTrees, and Logistic Regression). The Caltech-101 dataset used in the study is a low-resolution and noisy image dataset with class imbalance problems. Our method achieves better results by combining the feature extraction capabilities of the Vision Transformer model and the robust classification performance of classical machine learning classifiers. Experiments conducted on the Caltech-101 dataset demonstrate that the proposed method achieves a precision of 92.3%, a recall of 89.7%, and an accuracy of 95.5%, highlighting its effectiveness in addressing the challenges of object recognition in imbalanced datasets.

Keywords

Object recognition, Vision Transformer, Logistic Regression, Caltech 101, Image Processing, Artificial Intelligence

Görsel Dönüştürücü Tabanlı Yaklaşım: Nesne Tanıma için Yeni Bir Yöntem

Abstract

Bu makale, verimsiz ve dengesiz veri kümeleri üzerinde nesne tanıma uygulamalarını iyileştirmek için hibrit bir yöntem önermektedir. Önerilen yöntem, Vision Transformer (ViT) derin öğrenme modelini ve çeşitli klasik makine öğrenimi sınıflandırıcılarını (LightGBM, AdaBoost, ExtraTrees ve Logistic Regression) kullanarak nesne tanıma performansını artırmayı amaçlamaktadır. Çalışmada kullanılan Caltech-101 veri kümesi, sınıf dengesizliği sorunları olan düşük çözünürlüklü ve gürültülü bir görüntü veri kümesidir. Yöntemimiz, Vision Transformer modelinin özellik çıkarma yetenekleri ile klasik makine öğrenimi sınıflandırıcılarının sağlam sınıflandırma performansını birleştirerek daha iyi sonuçlar elde etmektedir. Caltech-101 veri kümesi üzerinde yapılan deneyler, önerilen yöntemin %92,3'lük bir hassasiyet ve %89,7'lik bir geri çağırma elde ettiğini ve diğer son teknoloji yöntemlerden önemli ölçüde daha iyi performans gösterdiğini ortaya koymaktadır. Deneysel sonuçlar, önerilen yöntemin diğer mevcut yöntemlerden daha iyi performans gösterdiğini ve nesne tanıma görevlerinde önemli iyileştirmeler sağladığını göstermektedir.

Keywords

Nesne tanıma, Vision Transformer, Lojistik Regresyon, Caltech 101, Görüntü İşleme, Yapay Zeka

References

Amerini, I., Ballan, L., Caldelli, R., Del Bimbo, A., & Serra, G. (2011). A SIFT-based forensic method for copy-move attack detection and transformation recovery. IEEE Transactions on Information Forensics and Security, 6(3 PART 2), 1099–1110. https://doi.org/10.1109/TIFS.2011.2129512
Bansal, M., Kumar, M., & Kumar, M. (2021). 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimedia Tools and Applications, 80(12), 18839–18857. https://doi.org/10.1007/s11042-021-10646-0
Bansal, M., Kumar, M., Kumar, M., & Kumar, K. (2021). An efficient technique for object recognition using Shi-Tomasi corner detection algorithm. Soft Computing, 25(6), 4423–4432. https://doi.org/10.1007/s00500-020-05453-y
Bosch, A., Zisserman, A., & Muñoz, X. (2007). Image classification using random forests and ferns. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2007.4409066
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … Houlsby, N. (2021). an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale. ICLR 2021 - 9th International Conference on Learning Representations.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 178. https://doi.org/10.1016/j.cviu.2005.09.012
Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611. https://doi.org/10.1109/TPAMI.2006.79
Gupta, S., Kumar, M., & Garg, A. (2019). Improved object recognition results using SIFT and ORB feature detector. Multimedia Tools and Applications, 78(23), 34157–34171. https://doi.org/10.1007/s11042-019-08232-6
Hussain, N., Khan, M. A., Sharif, M., Khan, S. A., Albesher, A. A., Saba, T., & Armaghan, A. (2024). A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimedia Tools and Applications, 83(5), 14935–14957. https://doi.org/10.1007/s11042-020-08852-3
Jalal, A., Ahmed, A., Rafique, A. A., & Kim, K. (2021). Scene Semantic Recognition Based on Modified Fuzzy C-Mean and Maximum Entropy Using Object-to-Object Relations. IEEE Access, 9, 27758–27772. https://doi.org/10.1109/ACCESS.2021.3058986

Jin, J., Chen, G., Meng, X., Zhang, Y., Shi, W., Li, Y., … Jiang, W. (2022). Prediction of river damming susceptibility by landslides based on a logistic regression model and InSAR techniques: A case study of the Bailong River Basin, China. Engineering Geology, 299(February). https://doi.org/10.1016/j.enggeo.2022.106562
KARADAĞ, C., & ÖZDEMİR, D. (2022). BÖBREK TÜMÖRÜ TESPİTİ İÇİN DERİN ÖĞRENME YÖNTEMLERİNİN KARŞILAŞTIRMALI ANALİZİ. 6(6), 10–23.
Keerthana, D., Venugopal, V., Nath, M. K., & Mishra, M. (2023). Hybrid convolutional neural networks with SVM classifier for classification of skin cancer. Biomedical Engineering Advances, 5(December 2022), 100069. https://doi.org/10.1016/j.bea.2022.100069
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25↓, V2-257-V2-259. https://doi.org/10.1016/B978-0-12-374105-9.00493-7
Liu, P., Guo, J. M., Chamnongthai, K., & Prasetyo, H. (2017). Fusion of color histogram and LBP-based features for texture image retrieval and classification. Information Sciences, 390, 95–111. https://doi.org/10.1016/j.ins.2017.01.025
Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 11–18. https://doi.org/10.1109/CVPR.2006.200
Naseer, A., Almujally, N. A., Alotaibi, S. S., Alazeb, A., & Park, J. (2024). Efficient Object Segmentation and Recognition Using Multi-Layer Perceptron Networks. Computers, Materials and Continua, 78(1), 1381–1398. https://doi.org/10.32604/cmc.2023.042963
Naseer, A., Alzahrani, H. A., Almujally, N. A., Nowaiser, K. Al, Mudawi, N. Al, Algarni, A., & Park, J. (2024). Efficient Multi-Object Recognition Using GMM Segmentation Feature Fusion Approach. IEEE Access, 12, 37165–37178. https://doi.org/10.1109/ACCESS.2024.3372190
Naseer, A., Mudawi, N. Al, Abdelhaq, M., Alonazi, M., Alazeb, A., Algarni, A., & Jalal, A. (2024). CNN-Based Object Detection via Segmentation Capabilities in Outdoor Natural Scenes. IEEE Access, 12(June), 84984–85000. https://doi.org/10.1109/ACCESS.2024.3413848
Rafique, A. A., Gochoo, M., Jalal, A., & Kim, K. (2023). Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network. Multimedia Tools and Applications, 82(9), 13401–13430. https://doi.org/10.1007/s11042-022-13717-y
Sikder, J., Islam, M. K., & Jahan, F. (2024). Object segmentation for image indexing in large database. Journal of King Saud University - Computer and Information Sciences, 36(2), 101937. https://doi.org/10.1016/j.jksuci.2024.101937
Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June, 10691–10700.
Telceken, M., & Kutlu, Y. (2022). Detecting Tagged People in Camera Images. Journal of Intelligent Systems with Applications, (May), 27–32. https://doi.org/10.54856/jiswa.202205197
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of Machine Learning Research, 139, 10347–10357.
Venugopal, V., Joseph, J., Vipin Das, M., & Kumar Nath, M. (2022). An EfficientNet-based modified sigmoid transform for enhancing dermatological macro-images of melanoma and nevi skin lesions. Computer Methods and Programs in Biomedicine, 222, 106935. https://doi.org/10.1016/j.cmpb.2022.106935
Zhang, L., & Pu, J. (2024). Object recognition based on shape interest points descriptor. Electronics Letters, 60(9), 1–3. https://doi.org/10.1049/ell2.13198
Zhang, R., Wang, L., Cheng, S., & Song, S. (2023). MLP-based classification of COVID-19 and skin diseases. Expert Systems with Applications, 228(March), 120389. https://doi.org/10.1016/j.eswa.2023.120389

Details

Primary Language

English

Subjects

Software Engineering (Other)

Journal Section

Research Article

Authors

Ali Khudhair Abbas Ali Ali
0009-0001-9843-1735
Iraq

Yıldız Aydın ^*
0000-0002-3877-6782
Türkiye

Publication Date

March 15, 2025

Submission Date

January 15, 2025

Acceptance Date

March 5, 2025

Published in Issue

Year 2025 Volume: 15 Number: 1

DOI

https://doi.org/10.31466/kfbd.1620640

IZ

https://izlik.org/JA29XM46KN

APA

Ali, A. K. A. A., & Aydın, Y. (2025). Vision Transformer-Based Approach: A Novel Method for Object Recognition. Karadeniz Fen Bilimleri Dergisi, 15(1), 560-576. https://doi.org/10.31466/kfbd.1620640

AMA

1.Ali AKAA, Aydın Y. Vision Transformer-Based Approach: A Novel Method for Object Recognition. KFBD. 2025;15(1):560-576. doi:10.31466/kfbd.1620640

Chicago

Ali, Ali Khudhair Abbas Ali, and Yıldız Aydın. 2025. “Vision Transformer-Based Approach: A Novel Method for Object Recognition”. Karadeniz Fen Bilimleri Dergisi 15 (1): 560-76. https://doi.org/10.31466/kfbd.1620640.

EndNote

Ali AKAA, Aydın Y (March 1, 2025) Vision Transformer-Based Approach: A Novel Method for Object Recognition. Karadeniz Fen Bilimleri Dergisi 15 1 560–576.

IEEE

[1]A. K. A. A. Ali and Y. Aydın, “Vision Transformer-Based Approach: A Novel Method for Object Recognition”, KFBD, vol. 15, no. 1, pp. 560–576, Mar. 2025, doi: 10.31466/kfbd.1620640.

ISNAD

Ali, Ali Khudhair Abbas Ali - Aydın, Yıldız. “Vision Transformer-Based Approach: A Novel Method for Object Recognition”. Karadeniz Fen Bilimleri Dergisi 15/1 (March 1, 2025): 560-576. https://doi.org/10.31466/kfbd.1620640.

JAMA

1.Ali AKAA, Aydın Y. Vision Transformer-Based Approach: A Novel Method for Object Recognition. KFBD. 2025;15:560–576.

MLA

Ali, Ali Khudhair Abbas Ali, and Yıldız Aydın. “Vision Transformer-Based Approach: A Novel Method for Object Recognition”. Karadeniz Fen Bilimleri Dergisi, vol. 15, no. 1, Mar. 2025, pp. 560-76, doi:10.31466/kfbd.1620640.

Vancouver

1.Ali Khudhair Abbas Ali Ali, Yıldız Aydın. Vision Transformer-Based Approach: A Novel Method for Object Recognition. KFBD. 2025 Mar. 1;15(1):560-76. doi:10.31466/kfbd.1620640

Cited By

Fidye Yazılımlarında Anomali ve Sahte İmza Tespiti: Zaman-Frekans Dönüşümü ve Transformer Tabanlı Analiz Modeli

Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi

https://doi.org/10.54365/adyumbd.1805519

The published articles in The Black Sea Journal of Sciences are licensed under Creative Commons Attribution-NonCommercial 4.0 International