Research Article
BibTex RIS Cite

LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields

Year 2026, Volume: 32 Issue: 2, 474 - 497, 24.03.2026
https://doi.org/10.15832/ankutbd.1739676
https://izlik.org/JA39KB82EA

Abstract

Insect pests pose a significant threat to agricultural productivity, making early and accurate identification essential for effective pest management. This study proposes a novel deep learning-based classification framework for multi-class pest recognition from field images. The proposed approach enhances discriminative region representation by integrating a patch embedding module, a Vision Transformer backbone, and a learnable spatial attention mask. This hybrid design enables the model to focus on critical visual cues without requiring segmentation-based preprocessing. The attention mask, learned via convolutional layers, is pooled and directly applied to Transformer-encoded patch tokens to refine spatial feature emphasis. Positional embeddings are further employed to preserve spatial context within the tokenized image representation. Experimental evaluations conducted on publicly available pest datasets with 5, 9, and 12 classes demonstrate the effectiveness and robustness of the proposed framework. The model achieves accuracies of 99.67%, 99.52%, and 97.00% on the Pest5, Pest9, and Pest12 datasets, respectively, indicating strong generalization across varying classification complexities. To enhance model transparency and reliability, visual interpretability is provided through Grad-CAM and attention heatmap visualizations that reveal the model’s focus regions. Additionally, t-SNE-based feature visualization illustrates clear separability in the learned embedding space. The proposed framework shows strong potential for practical deployment in smart agriculture and precision pest monitoring systems. 

References

  • Dinca M A, PopescuD, Ichim L & Angelescu N (2025). Ensemble of efficient vision transformers for insect classification. Applied Sciences 15(13): 7610. https://doi.org/10.3390/app15137610
  • Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2010.11929
  • Fuentes A, Yoon S, Kim S C & Park D S (2017). A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 17(9): 2022. https://doi.org/10.3390/s17092022
  • Ferentinos K P (2018). Deep learning for plant disease detection and diagnosis in smart farming. Computers and Electronics in Agriculture 162: 112–123. https://doi.org/10.1016/j.compag.2018.01.009
  • Ghosh P (2021). Crop Pest Dataset [Data set].. https://www.kaggle.com/datasets/pialghosh/crop-pest-dataset
  • Hu Y, Deng X, Lan Y, Chen X, Long Y & Liu C (2023). Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects 14(3): 280. https://doi.org/10.3390/insects14030280
  • Jelali M (2024). Deep learning networks-based tomato disease and pest detection: a first review of research studies using real field datasets. Frontiers in Plant Science 15: 1493322. https://doi.org/10.3389/fpls.2024.1493322
  • Kim G, Son C & Lee S (2025). ROI-aware multiscale cross-attention vision transformer for pest image identification. Computers and Electronics in Agriculture 237: Article 107546. https://doi.org/10.1016/j.compag.2025.110732
  • Li Y, Wang H, Dang L M, Sadeghi-Niaraki A & Moon H (2020). Crop pest recognition in natural scenes using convolutional neural networks. Computers and Electronics in Agriculture 169: 105174. https://doi.org/10.1016/j.compag.2019.105174
  • Mohanty S P, Hughes D P & Salathé M (2016). Using deep learning for image-based plant disease detection. Frontiers in Plant Science 7: 1419. https://doi.org/10.3389/fpls.2016.01419
  • Murugavalli S & Gopi R (2025). Plant leaf disease detection using vision transformers for precision agriculture. Scientific Reports 15: 22361. https://doi.org/10.1038/s41598-025-05102-0
  • Qian X, Zhang C, Chen L & Li K (2022). Deep learning-based identification of maize leaf diseases is improved by an attention mechanism: Self-attention. Frontiers in Plant Science, 13, 864486. https://doi.org/10.3389/fpls.2022.864486
  • Saranya T, Deisy C & Sridevi S (2024). Efficient agricultural pest classification using vision transformer with hybrid pooled multihead attention. Computers in Biology and Medicine 177: 108584. https://doi.org/10.1016/j.compbiomed.2024.108584
  • Sharma N, Al‐Yarimi F A M, Bharany S, Rehman A U & Taye B M (2025). Hybrid AI Model With CNNs and Vision Transformers for Precision Pest Classification in Crops. Food Science & Nutrition 13(11): e71174. https://doi.org/10.1002/fsn3.71174
  • SimranVolunesia (2023). Pest Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/simranvolunesia/pest-dataset.
  • Vencerlanz (2022). Agricultural pests image dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/vencerlanz09/agricultural-pests image-dataset
  • Venkatasaichandrakanth P & Iyapparaja M (2024). GNViT-An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model. Plos one 19(3): e0301174. https://doi.org/10.1371/journal.pone.0301174
There are 17 citations in total.

Details

Primary Language English
Subjects Evolutionary Computation
Journal Section Research Article
Authors

Canan Taştimur Temiz 0000-0002-3714-6826

Volkan Kaya 0000-0001-6940-3260

Submission Date July 10, 2025
Acceptance Date December 31, 2025
Publication Date March 24, 2026
DOI https://doi.org/10.15832/ankutbd.1739676
IZ https://izlik.org/JA39KB82EA
Published in Issue Year 2026 Volume: 32 Issue: 2

Cite

APA Taştimur Temiz, C., & Kaya, V. (2026). LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. Journal of Agricultural Sciences, 32(2), 474-497. https://doi.org/10.15832/ankutbd.1739676
AMA 1.Taştimur Temiz C, Kaya V. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026;32(2):474-497. doi:10.15832/ankutbd.1739676
Chicago Taştimur Temiz, Canan, and Volkan Kaya. 2026. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences 32 (2): 474-97. https://doi.org/10.15832/ankutbd.1739676.
EndNote Taştimur Temiz C, Kaya V (March 1, 2026) LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. Journal of Agricultural Sciences 32 2 474–497.
IEEE [1]C. Taştimur Temiz and V. Kaya, “LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields”, J Agr Sci-Tarim Bili, vol. 32, no. 2, pp. 474–497, Mar. 2026, doi: 10.15832/ankutbd.1739676.
ISNAD Taştimur Temiz, Canan - Kaya, Volkan. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences 32/2 (March 1, 2026): 474-497. https://doi.org/10.15832/ankutbd.1739676.
JAMA 1.Taştimur Temiz C, Kaya V. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026;32:474–497.
MLA Taştimur Temiz, Canan, and Volkan Kaya. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences, vol. 32, no. 2, Mar. 2026, pp. 474-97, doi:10.15832/ankutbd.1739676.
Vancouver 1.Canan Taştimur Temiz, Volkan Kaya. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026 Mar. 1;32(2):474-97. doi:10.15832/ankutbd.1739676

Journal of Agricultural Sciences is published as open access journal. All articles are published under the terms of the Creative Commons Attribution License (CC BY).