Araştırma Makalesi
BibTex RIS Kaynak Göster

Görsel Dönüştürücü Tabanlı Yaklaşım: Nesne Tanıma için Yeni Bir Yöntem

Yıl 2025, Cilt: 15 Sayı: 1, 560 - 576, 15.03.2025
https://doi.org/10.31466/kfbd.1620640

Öz

Bu makale, verimsiz ve dengesiz veri kümeleri üzerinde nesne tanıma uygulamalarını iyileştirmek için hibrit bir yöntem önermektedir. Önerilen yöntem, Vision Transformer (ViT) derin öğrenme modelini ve çeşitli klasik makine öğrenimi sınıflandırıcılarını (LightGBM, AdaBoost, ExtraTrees ve Logistic Regression) kullanarak nesne tanıma performansını artırmayı amaçlamaktadır. Çalışmada kullanılan Caltech-101 veri kümesi, sınıf dengesizliği sorunları olan düşük çözünürlüklü ve gürültülü bir görüntü veri kümesidir. Yöntemimiz, Vision Transformer modelinin özellik çıkarma yetenekleri ile klasik makine öğrenimi sınıflandırıcılarının sağlam sınıflandırma performansını birleştirerek daha iyi sonuçlar elde etmektedir. Caltech-101 veri kümesi üzerinde yapılan deneyler, önerilen yöntemin %92,3'lük bir hassasiyet ve %89,7'lik bir geri çağırma elde ettiğini ve diğer son teknoloji yöntemlerden önemli ölçüde daha iyi performans gösterdiğini ortaya koymaktadır. Deneysel sonuçlar, önerilen yöntemin diğer mevcut yöntemlerden daha iyi performans gösterdiğini ve nesne tanıma görevlerinde önemli iyileştirmeler sağladığını göstermektedir.

Kaynakça

  • Amerini, I., Ballan, L., Caldelli, R., Del Bimbo, A., & Serra, G. (2011). A SIFT-based forensic method for copy-move attack detection and transformation recovery. IEEE Transactions on Information Forensics and Security, 6(3 PART 2), 1099–1110. https://doi.org/10.1109/TIFS.2011.2129512
  • Bansal, M., Kumar, M., & Kumar, M. (2021). 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimedia Tools and Applications, 80(12), 18839–18857. https://doi.org/10.1007/s11042-021-10646-0
  • Bansal, M., Kumar, M., Kumar, M., & Kumar, K. (2021). An efficient technique for object recognition using Shi-Tomasi corner detection algorithm. Soft Computing, 25(6), 4423–4432. https://doi.org/10.1007/s00500-020-05453-y
  • Bosch, A., Zisserman, A., & Muñoz, X. (2007). Image classification using random forests and ferns. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2007.4409066
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … Houlsby, N. (2021). an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale. ICLR 2021 - 9th International Conference on Learning Representations.
  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 178. https://doi.org/10.1016/j.cviu.2005.09.012
  • Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611. https://doi.org/10.1109/TPAMI.2006.79
  • Gupta, S., Kumar, M., & Garg, A. (2019). Improved object recognition results using SIFT and ORB feature detector. Multimedia Tools and Applications, 78(23), 34157–34171. https://doi.org/10.1007/s11042-019-08232-6
  • Hussain, N., Khan, M. A., Sharif, M., Khan, S. A., Albesher, A. A., Saba, T., & Armaghan, A. (2024). A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimedia Tools and Applications, 83(5), 14935–14957. https://doi.org/10.1007/s11042-020-08852-3
  • Jalal, A., Ahmed, A., Rafique, A. A., & Kim, K. (2021). Scene Semantic Recognition Based on Modified Fuzzy C-Mean and Maximum Entropy Using Object-to-Object Relations. IEEE Access, 9, 27758–27772. https://doi.org/10.1109/ACCESS.2021.3058986
  • Jin, J., Chen, G., Meng, X., Zhang, Y., Shi, W., Li, Y., … Jiang, W. (2022). Prediction of river damming susceptibility by landslides based on a logistic regression model and InSAR techniques: A case study of the Bailong River Basin, China. Engineering Geology, 299(February). https://doi.org/10.1016/j.enggeo.2022.106562
  • KARADAĞ, C., & ÖZDEMİR, D. (2022). BÖBREK TÜMÖRÜ TESPİTİ İÇİN DERİN ÖĞRENME YÖNTEMLERİNİN KARŞILAŞTIRMALI ANALİZİ. 6(6), 10–23.
  • Keerthana, D., Venugopal, V., Nath, M. K., & Mishra, M. (2023). Hybrid convolutional neural networks with SVM classifier for classification of skin cancer. Biomedical Engineering Advances, 5(December 2022), 100069. https://doi.org/10.1016/j.bea.2022.100069
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25↓, V2-257-V2-259. https://doi.org/10.1016/B978-0-12-374105-9.00493-7
  • Liu, P., Guo, J. M., Chamnongthai, K., & Prasetyo, H. (2017). Fusion of color histogram and LBP-based features for texture image retrieval and classification. Information Sciences, 390, 95–111. https://doi.org/10.1016/j.ins.2017.01.025
  • Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 11–18. https://doi.org/10.1109/CVPR.2006.200
  • Naseer, A., Almujally, N. A., Alotaibi, S. S., Alazeb, A., & Park, J. (2024). Efficient Object Segmentation and Recognition Using Multi-Layer Perceptron Networks. Computers, Materials and Continua, 78(1), 1381–1398. https://doi.org/10.32604/cmc.2023.042963
  • Naseer, A., Alzahrani, H. A., Almujally, N. A., Nowaiser, K. Al, Mudawi, N. Al, Algarni, A., & Park, J. (2024). Efficient Multi-Object Recognition Using GMM Segmentation Feature Fusion Approach. IEEE Access, 12, 37165–37178. https://doi.org/10.1109/ACCESS.2024.3372190
  • Naseer, A., Mudawi, N. Al, Abdelhaq, M., Alonazi, M., Alazeb, A., Algarni, A., & Jalal, A. (2024). CNN-Based Object Detection via Segmentation Capabilities in Outdoor Natural Scenes. IEEE Access, 12(June), 84984–85000. https://doi.org/10.1109/ACCESS.2024.3413848
  • Rafique, A. A., Gochoo, M., Jalal, A., & Kim, K. (2023). Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network. Multimedia Tools and Applications, 82(9), 13401–13430. https://doi.org/10.1007/s11042-022-13717-y
  • Sikder, J., Islam, M. K., & Jahan, F. (2024). Object segmentation for image indexing in large database. Journal of King Saud University - Computer and Information Sciences, 36(2), 101937. https://doi.org/10.1016/j.jksuci.2024.101937
  • Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June, 10691–10700.
  • Telceken, M., & Kutlu, Y. (2022). Detecting Tagged People in Camera Images. Journal of Intelligent Systems with Applications, (May), 27–32. https://doi.org/10.54856/jiswa.202205197
  • Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of Machine Learning Research, 139, 10347–10357.
  • Venugopal, V., Joseph, J., Vipin Das, M., & Kumar Nath, M. (2022). An EfficientNet-based modified sigmoid transform for enhancing dermatological macro-images of melanoma and nevi skin lesions. Computer Methods and Programs in Biomedicine, 222, 106935. https://doi.org/10.1016/j.cmpb.2022.106935
  • Zhang, L., & Pu, J. (2024). Object recognition based on shape interest points descriptor. Electronics Letters, 60(9), 1–3. https://doi.org/10.1049/ell2.13198
  • Zhang, R., Wang, L., Cheng, S., & Song, S. (2023). MLP-based classification of COVID-19 and skin diseases. Expert Systems with Applications, 228(March), 120389. https://doi.org/10.1016/j.eswa.2023.120389

Vision Transformer-Based Approach: A Novel Method for Object Recognition

Yıl 2025, Cilt: 15 Sayı: 1, 560 - 576, 15.03.2025
https://doi.org/10.31466/kfbd.1620640

Öz

This paper proposes a hybrid method to improve object recognition applications on inefficient and imbalanced datasets. The proposed method aims to enhance object recognition performance using the Vision Transformer (ViT) deep learning model and various classical machine learning classifiers (LightGBM, AdaBoost, ExtraTrees, and Logistic Regression). The Caltech-101 dataset used in the study is a low-resolution and noisy image dataset with class imbalance problems. Our method achieves better results by combining the feature extraction capabilities of the Vision Transformer model and the robust classification performance of classical machine learning classifiers. Experiments conducted on the Caltech-101 dataset demonstrate that the proposed method achieves a precision of 92.3%, a recall of 89.7%, and an accuracy of 95.5%, highlighting its effectiveness in addressing the challenges of object recognition in imbalanced datasets.

Kaynakça

  • Amerini, I., Ballan, L., Caldelli, R., Del Bimbo, A., & Serra, G. (2011). A SIFT-based forensic method for copy-move attack detection and transformation recovery. IEEE Transactions on Information Forensics and Security, 6(3 PART 2), 1099–1110. https://doi.org/10.1109/TIFS.2011.2129512
  • Bansal, M., Kumar, M., & Kumar, M. (2021). 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimedia Tools and Applications, 80(12), 18839–18857. https://doi.org/10.1007/s11042-021-10646-0
  • Bansal, M., Kumar, M., Kumar, M., & Kumar, K. (2021). An efficient technique for object recognition using Shi-Tomasi corner detection algorithm. Soft Computing, 25(6), 4423–4432. https://doi.org/10.1007/s00500-020-05453-y
  • Bosch, A., Zisserman, A., & Muñoz, X. (2007). Image classification using random forests and ferns. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2007.4409066
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … Houlsby, N. (2021). an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale. ICLR 2021 - 9th International Conference on Learning Representations.
  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 178. https://doi.org/10.1016/j.cviu.2005.09.012
  • Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611. https://doi.org/10.1109/TPAMI.2006.79
  • Gupta, S., Kumar, M., & Garg, A. (2019). Improved object recognition results using SIFT and ORB feature detector. Multimedia Tools and Applications, 78(23), 34157–34171. https://doi.org/10.1007/s11042-019-08232-6
  • Hussain, N., Khan, M. A., Sharif, M., Khan, S. A., Albesher, A. A., Saba, T., & Armaghan, A. (2024). A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimedia Tools and Applications, 83(5), 14935–14957. https://doi.org/10.1007/s11042-020-08852-3
  • Jalal, A., Ahmed, A., Rafique, A. A., & Kim, K. (2021). Scene Semantic Recognition Based on Modified Fuzzy C-Mean and Maximum Entropy Using Object-to-Object Relations. IEEE Access, 9, 27758–27772. https://doi.org/10.1109/ACCESS.2021.3058986
  • Jin, J., Chen, G., Meng, X., Zhang, Y., Shi, W., Li, Y., … Jiang, W. (2022). Prediction of river damming susceptibility by landslides based on a logistic regression model and InSAR techniques: A case study of the Bailong River Basin, China. Engineering Geology, 299(February). https://doi.org/10.1016/j.enggeo.2022.106562
  • KARADAĞ, C., & ÖZDEMİR, D. (2022). BÖBREK TÜMÖRÜ TESPİTİ İÇİN DERİN ÖĞRENME YÖNTEMLERİNİN KARŞILAŞTIRMALI ANALİZİ. 6(6), 10–23.
  • Keerthana, D., Venugopal, V., Nath, M. K., & Mishra, M. (2023). Hybrid convolutional neural networks with SVM classifier for classification of skin cancer. Biomedical Engineering Advances, 5(December 2022), 100069. https://doi.org/10.1016/j.bea.2022.100069
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25↓, V2-257-V2-259. https://doi.org/10.1016/B978-0-12-374105-9.00493-7
  • Liu, P., Guo, J. M., Chamnongthai, K., & Prasetyo, H. (2017). Fusion of color histogram and LBP-based features for texture image retrieval and classification. Information Sciences, 390, 95–111. https://doi.org/10.1016/j.ins.2017.01.025
  • Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 11–18. https://doi.org/10.1109/CVPR.2006.200
  • Naseer, A., Almujally, N. A., Alotaibi, S. S., Alazeb, A., & Park, J. (2024). Efficient Object Segmentation and Recognition Using Multi-Layer Perceptron Networks. Computers, Materials and Continua, 78(1), 1381–1398. https://doi.org/10.32604/cmc.2023.042963
  • Naseer, A., Alzahrani, H. A., Almujally, N. A., Nowaiser, K. Al, Mudawi, N. Al, Algarni, A., & Park, J. (2024). Efficient Multi-Object Recognition Using GMM Segmentation Feature Fusion Approach. IEEE Access, 12, 37165–37178. https://doi.org/10.1109/ACCESS.2024.3372190
  • Naseer, A., Mudawi, N. Al, Abdelhaq, M., Alonazi, M., Alazeb, A., Algarni, A., & Jalal, A. (2024). CNN-Based Object Detection via Segmentation Capabilities in Outdoor Natural Scenes. IEEE Access, 12(June), 84984–85000. https://doi.org/10.1109/ACCESS.2024.3413848
  • Rafique, A. A., Gochoo, M., Jalal, A., & Kim, K. (2023). Maximum entropy scaled super pixels segmentation for multi-object detection and scene recognition via deep belief network. Multimedia Tools and Applications, 82(9), 13401–13430. https://doi.org/10.1007/s11042-022-13717-y
  • Sikder, J., Islam, M. K., & Jahan, F. (2024). Object segmentation for image indexing in large database. Journal of King Saud University - Computer and Information Sciences, 36(2), 101937. https://doi.org/10.1016/j.jksuci.2024.101937
  • Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June, 10691–10700.
  • Telceken, M., & Kutlu, Y. (2022). Detecting Tagged People in Camera Images. Journal of Intelligent Systems with Applications, (May), 27–32. https://doi.org/10.54856/jiswa.202205197
  • Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of Machine Learning Research, 139, 10347–10357.
  • Venugopal, V., Joseph, J., Vipin Das, M., & Kumar Nath, M. (2022). An EfficientNet-based modified sigmoid transform for enhancing dermatological macro-images of melanoma and nevi skin lesions. Computer Methods and Programs in Biomedicine, 222, 106935. https://doi.org/10.1016/j.cmpb.2022.106935
  • Zhang, L., & Pu, J. (2024). Object recognition based on shape interest points descriptor. Electronics Letters, 60(9), 1–3. https://doi.org/10.1049/ell2.13198
  • Zhang, R., Wang, L., Cheng, S., & Song, S. (2023). MLP-based classification of COVID-19 and skin diseases. Expert Systems with Applications, 228(March), 120389. https://doi.org/10.1016/j.eswa.2023.120389
Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yazılım Mühendisliği (Diğer)
Bölüm Makaleler
Yazarlar

Ali Khudhair Abbas Ali Ali 0009-0001-9843-1735

Yıldız Aydın 0000-0002-3877-6782

Yayımlanma Tarihi 15 Mart 2025
Gönderilme Tarihi 15 Ocak 2025
Kabul Tarihi 5 Mart 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 15 Sayı: 1

Kaynak Göster

APA Ali, A. K. A. A., & Aydın, Y. (2025). Vision Transformer-Based Approach: A Novel Method for Object Recognition. Karadeniz Fen Bilimleri Dergisi, 15(1), 560-576. https://doi.org/10.31466/kfbd.1620640