Research Article

LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields

Volume: 32 Number: 2 March 24, 2026

LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields

Abstract

Insect pests pose a significant threat to agricultural productivity, making early and accurate identification essential for effective pest management. This study proposes a novel deep learning-based classification framework for multi-class pest recognition from field images. The proposed approach enhances discriminative region representation by integrating a patch embedding module, a Vision Transformer backbone, and a learnable spatial attention mask. This hybrid design enables the model to focus on critical visual cues without requiring segmentation-based preprocessing. The attention mask, learned via convolutional layers, is pooled and directly applied to Transformer-encoded patch tokens to refine spatial feature emphasis. Positional embeddings are further employed to preserve spatial context within the tokenized image representation. Experimental evaluations conducted on publicly available pest datasets with 5, 9, and 12 classes demonstrate the effectiveness and robustness of the proposed framework. The model achieves accuracies of 99.67%, 99.52%, and 97.00% on the Pest5, Pest9, and Pest12 datasets, respectively, indicating strong generalization across varying classification complexities. To enhance model transparency and reliability, visual interpretability is provided through Grad-CAM and attention heatmap visualizations that reveal the model’s focus regions. Additionally, t-SNE-based feature visualization illustrates clear separability in the learned embedding space. The proposed framework shows strong potential for practical deployment in smart agriculture and precision pest monitoring systems. 

Keywords

References

  1. Dinca M A, PopescuD, Ichim L & Angelescu N (2025). Ensemble of efficient vision transformers for insect classification. Applied Sciences 15(13): 7610. https://doi.org/10.3390/app15137610
  2. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2010.11929
  3. Fuentes A, Yoon S, Kim S C & Park D S (2017). A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 17(9): 2022. https://doi.org/10.3390/s17092022
  4. Ferentinos K P (2018). Deep learning for plant disease detection and diagnosis in smart farming. Computers and Electronics in Agriculture 162: 112–123. https://doi.org/10.1016/j.compag.2018.01.009
  5. Ghosh P (2021). Crop Pest Dataset [Data set].. https://www.kaggle.com/datasets/pialghosh/crop-pest-dataset
  6. Hu Y, Deng X, Lan Y, Chen X, Long Y & Liu C (2023). Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects 14(3): 280. https://doi.org/10.3390/insects14030280
  7. Jelali M (2024). Deep learning networks-based tomato disease and pest detection: a first review of research studies using real field datasets. Frontiers in Plant Science 15: 1493322. https://doi.org/10.3389/fpls.2024.1493322
  8. Kim G, Son C & Lee S (2025). ROI-aware multiscale cross-attention vision transformer for pest image identification. Computers and Electronics in Agriculture 237: Article 107546. https://doi.org/10.1016/j.compag.2025.110732

Details

Primary Language

English

Subjects

Evolutionary Computation

Journal Section

Research Article

Publication Date

March 24, 2026

Submission Date

July 10, 2025

Acceptance Date

December 31, 2025

Published in Issue

Year 2026 Volume: 32 Number: 2

APA
Taştimur Temiz, C., & Kaya, V. (2026). LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. Journal of Agricultural Sciences, 32(2), 474-497. https://doi.org/10.15832/ankutbd.1739676
AMA
1.Taştimur Temiz C, Kaya V. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026;32(2):474-497. doi:10.15832/ankutbd.1739676
Chicago
Taştimur Temiz, Canan, and Volkan Kaya. 2026. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences 32 (2): 474-97. https://doi.org/10.15832/ankutbd.1739676.
EndNote
Taştimur Temiz C, Kaya V (March 1, 2026) LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. Journal of Agricultural Sciences 32 2 474–497.
IEEE
[1]C. Taştimur Temiz and V. Kaya, “LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields”, J Agr Sci-Tarim Bili, vol. 32, no. 2, pp. 474–497, Mar. 2026, doi: 10.15832/ankutbd.1739676.
ISNAD
Taştimur Temiz, Canan - Kaya, Volkan. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences 32/2 (March 1, 2026): 474-497. https://doi.org/10.15832/ankutbd.1739676.
JAMA
1.Taştimur Temiz C, Kaya V. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026;32:474–497.
MLA
Taştimur Temiz, Canan, and Volkan Kaya. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences, vol. 32, no. 2, Mar. 2026, pp. 474-97, doi:10.15832/ankutbd.1739676.
Vancouver
1.Canan Taştimur Temiz, Volkan Kaya. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026 Mar. 1;32(2):474-97. doi:10.15832/ankutbd.1739676

Journal of Agricultural Sciences is published as open access journal. All articles are published under the terms of the Creative Commons Attribution License (CC BY).