Research Article

Vision Transformer-Based Approach: A Novel Method for Object Recognition

Volume: 15 Number: 1 March 15, 2025
EN TR

Vision Transformer-Based Approach: A Novel Method for Object Recognition

Abstract

This paper proposes a hybrid method to improve object recognition applications on inefficient and imbalanced datasets. The proposed method aims to enhance object recognition performance using the Vision Transformer (ViT) deep learning model and various classical machine learning classifiers (LightGBM, AdaBoost, ExtraTrees, and Logistic Regression). The Caltech-101 dataset used in the study is a low-resolution and noisy image dataset with class imbalance problems. Our method achieves better results by combining the feature extraction capabilities of the Vision Transformer model and the robust classification performance of classical machine learning classifiers. Experiments conducted on the Caltech-101 dataset demonstrate that the proposed method achieves a precision of 92.3%, a recall of 89.7%, and an accuracy of 95.5%, highlighting its effectiveness in addressing the challenges of object recognition in imbalanced datasets.

Keywords

Object recognition, Vision Transformer, Logistic Regression, Caltech 101, Image Processing, Artificial Intelligence

References

  1. Amerini, I., Ballan, L., Caldelli, R., Del Bimbo, A., & Serra, G. (2011). A SIFT-based forensic method for copy-move attack detection and transformation recovery. IEEE Transactions on Information Forensics and Security, 6(3 PART 2), 1099–1110. https://doi.org/10.1109/TIFS.2011.2129512
  2. Bansal, M., Kumar, M., & Kumar, M. (2021). 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimedia Tools and Applications, 80(12), 18839–18857. https://doi.org/10.1007/s11042-021-10646-0
  3. Bansal, M., Kumar, M., Kumar, M., & Kumar, K. (2021). An efficient technique for object recognition using Shi-Tomasi corner detection algorithm. Soft Computing, 25(6), 4423–4432. https://doi.org/10.1007/s00500-020-05453-y
  4. Bosch, A., Zisserman, A., & Muñoz, X. (2007). Image classification using random forests and ferns. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2007.4409066
  5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … Houlsby, N. (2021). an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale. ICLR 2021 - 9th International Conference on Learning Representations.
  6. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 178. https://doi.org/10.1016/j.cviu.2005.09.012
  7. Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611. https://doi.org/10.1109/TPAMI.2006.79
  8. Gupta, S., Kumar, M., & Garg, A. (2019). Improved object recognition results using SIFT and ORB feature detector. Multimedia Tools and Applications, 78(23), 34157–34171. https://doi.org/10.1007/s11042-019-08232-6
  9. Hussain, N., Khan, M. A., Sharif, M., Khan, S. A., Albesher, A. A., Saba, T., & Armaghan, A. (2024). A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimedia Tools and Applications, 83(5), 14935–14957. https://doi.org/10.1007/s11042-020-08852-3
  10. Jalal, A., Ahmed, A., Rafique, A. A., & Kim, K. (2021). Scene Semantic Recognition Based on Modified Fuzzy C-Mean and Maximum Entropy Using Object-to-Object Relations. IEEE Access, 9, 27758–27772. https://doi.org/10.1109/ACCESS.2021.3058986
APA
Ali, A. K. A. A., & Aydın, Y. (2025). Vision Transformer-Based Approach: A Novel Method for Object Recognition. Karadeniz Fen Bilimleri Dergisi, 15(1), 560-576. https://doi.org/10.31466/kfbd.1620640
AMA
1.Ali AKAA, Aydın Y. Vision Transformer-Based Approach: A Novel Method for Object Recognition. KFBD. 2025;15(1):560-576. doi:10.31466/kfbd.1620640
Chicago
Ali, Ali Khudhair Abbas Ali, and Yıldız Aydın. 2025. “Vision Transformer-Based Approach: A Novel Method for Object Recognition”. Karadeniz Fen Bilimleri Dergisi 15 (1): 560-76. https://doi.org/10.31466/kfbd.1620640.
EndNote
Ali AKAA, Aydın Y (March 1, 2025) Vision Transformer-Based Approach: A Novel Method for Object Recognition. Karadeniz Fen Bilimleri Dergisi 15 1 560–576.
IEEE
[1]A. K. A. A. Ali and Y. Aydın, “Vision Transformer-Based Approach: A Novel Method for Object Recognition”, KFBD, vol. 15, no. 1, pp. 560–576, Mar. 2025, doi: 10.31466/kfbd.1620640.
ISNAD
Ali, Ali Khudhair Abbas Ali - Aydın, Yıldız. “Vision Transformer-Based Approach: A Novel Method for Object Recognition”. Karadeniz Fen Bilimleri Dergisi 15/1 (March 1, 2025): 560-576. https://doi.org/10.31466/kfbd.1620640.
JAMA
1.Ali AKAA, Aydın Y. Vision Transformer-Based Approach: A Novel Method for Object Recognition. KFBD. 2025;15:560–576.
MLA
Ali, Ali Khudhair Abbas Ali, and Yıldız Aydın. “Vision Transformer-Based Approach: A Novel Method for Object Recognition”. Karadeniz Fen Bilimleri Dergisi, vol. 15, no. 1, Mar. 2025, pp. 560-76, doi:10.31466/kfbd.1620640.
Vancouver
1.Ali Khudhair Abbas Ali Ali, Yıldız Aydın. Vision Transformer-Based Approach: A Novel Method for Object Recognition. KFBD. 2025 Mar. 1;15(1):560-76. doi:10.31466/kfbd.1620640