LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields

Canan Taştimur Temiz; Volkan Kaya

doi:10.15832/ankutbd.1739676

LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields

Abstract

Insect pests pose a significant threat to agricultural productivity, making early and accurate identification essential for effective pest management. This study proposes a novel deep learning-based classification framework for multi-class pest recognition from field images. The proposed approach enhances discriminative region representation by integrating a patch embedding module, a Vision Transformer backbone, and a learnable spatial attention mask. This hybrid design enables the model to focus on critical visual cues without requiring segmentation-based preprocessing. The attention mask, learned via convolutional layers, is pooled and directly applied to Transformer-encoded patch tokens to refine spatial feature emphasis. Positional embeddings are further employed to preserve spatial context within the tokenized image representation. Experimental evaluations conducted on publicly available pest datasets with 5, 9, and 12 classes demonstrate the effectiveness and robustness of the proposed framework. The model achieves accuracies of 99.67%, 99.52%, and 97.00% on the Pest5, Pest9, and Pest12 datasets, respectively, indicating strong generalization across varying classification complexities. To enhance model transparency and reliability, visual interpretability is provided through Grad-CAM and attention heatmap visualizations that reveal the model’s focus regions. Additionally, t-SNE-based feature visualization illustrates clear separability in the learned embedding space. The proposed framework shows strong potential for practical deployment in smart agriculture and precision pest monitoring systems.

Keywords

References

Dinca M A, PopescuD, Ichim L & Angelescu N (2025). Ensemble of efficient vision transformers for insect classification. Applied Sciences 15(13): 7610. https://doi.org/10.3390/app15137610
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2010.11929
Fuentes A, Yoon S, Kim S C & Park D S (2017). A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 17(9): 2022. https://doi.org/10.3390/s17092022
Ferentinos K P (2018). Deep learning for plant disease detection and diagnosis in smart farming. Computers and Electronics in Agriculture 162: 112–123. https://doi.org/10.1016/j.compag.2018.01.009
Ghosh P (2021). Crop Pest Dataset [Data set].. https://www.kaggle.com/datasets/pialghosh/crop-pest-dataset
Hu Y, Deng X, Lan Y, Chen X, Long Y & Liu C (2023). Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects 14(3): 280. https://doi.org/10.3390/insects14030280
Jelali M (2024). Deep learning networks-based tomato disease and pest detection: a first review of research studies using real field datasets. Frontiers in Plant Science 15: 1493322. https://doi.org/10.3389/fpls.2024.1493322
Kim G, Son C & Lee S (2025). ROI-aware multiscale cross-attention vision transformer for pest image identification. Computers and Electronics in Agriculture 237: Article 107546. https://doi.org/10.1016/j.compag.2025.110732

Li Y, Wang H, Dang L M, Sadeghi-Niaraki A & Moon H (2020). Crop pest recognition in natural scenes using convolutional neural networks. Computers and Electronics in Agriculture 169: 105174. https://doi.org/10.1016/j.compag.2019.105174
Mohanty S P, Hughes D P & Salathé M (2016). Using deep learning for image-based plant disease detection. Frontiers in Plant Science 7: 1419. https://doi.org/10.3389/fpls.2016.01419
Murugavalli S & Gopi R (2025). Plant leaf disease detection using vision transformers for precision agriculture. Scientific Reports 15: 22361. https://doi.org/10.1038/s41598-025-05102-0
Qian X, Zhang C, Chen L & Li K (2022). Deep learning-based identification of maize leaf diseases is improved by an attention mechanism: Self-attention. Frontiers in Plant Science, 13, 864486. https://doi.org/10.3389/fpls.2022.864486
Saranya T, Deisy C & Sridevi S (2024). Efficient agricultural pest classification using vision transformer with hybrid pooled multihead attention. Computers in Biology and Medicine 177: 108584. https://doi.org/10.1016/j.compbiomed.2024.108584
Sharma N, Al‐Yarimi F A M, Bharany S, Rehman A U & Taye B M (2025). Hybrid AI Model With CNNs and Vision Transformers for Precision Pest Classification in Crops. Food Science & Nutrition 13(11): e71174. https://doi.org/10.1002/fsn3.71174
SimranVolunesia (2023). Pest Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/simranvolunesia/pest-dataset.
Vencerlanz (2022). Agricultural pests image dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/vencerlanz09/agricultural-pests image-dataset
Venkatasaichandrakanth P & Iyapparaja M (2024). GNViT-An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model. Plos one 19(3): e0301174. https://doi.org/10.1371/journal.pone.0301174

Details

Primary Language

English

Subjects

Evolutionary Computation

Journal Section

Research Article

Authors

Canan Taştimur Temiz
0000-0002-3714-6826
Türkiye

Volkan Kaya ^*
0000-0001-6940-3260
Türkiye

Publication Date

March 24, 2026

Submission Date

July 10, 2025

Acceptance Date

December 31, 2025

Published in Issue

Year 2026 Volume: 32 Number: 2

DOI

https://doi.org/10.15832/ankutbd.1739676

IZ

https://izlik.org/JA39KB82EA

Cite

RIS / Bibtex

APA

Taştimur Temiz, C., & Kaya, V. (2026). LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. Journal of Agricultural Sciences, 32(2), 474-497. https://doi.org/10.15832/ankutbd.1739676

AMA

1.Taştimur Temiz C, Kaya V. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026;32(2):474-497. doi:10.15832/ankutbd.1739676

Chicago

Taştimur Temiz, Canan, and Volkan Kaya. 2026. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences 32 (2): 474-97. https://doi.org/10.15832/ankutbd.1739676.

EndNote

Taştimur Temiz C, Kaya V (March 1, 2026) LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. Journal of Agricultural Sciences 32 2 474–497.

IEEE

[1]C. Taştimur Temiz and V. Kaya, “LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields”, J Agr Sci-Tarim Bili, vol. 32, no. 2, pp. 474–497, Mar. 2026, doi: 10.15832/ankutbd.1739676.

ISNAD

Taştimur Temiz, Canan - Kaya, Volkan. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences 32/2 (March 1, 2026): 474-497. https://doi.org/10.15832/ankutbd.1739676.

JAMA

1.Taştimur Temiz C, Kaya V. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026;32:474–497.

MLA

Taştimur Temiz, Canan, and Volkan Kaya. “LAViT: Class-Aware Vision Transformer With Learnable Attention for Pest Recognition in Agricultural Fields”. Journal of Agricultural Sciences, vol. 32, no. 2, Mar. 2026, pp. 474-97, doi:10.15832/ankutbd.1739676.

Vancouver

1.Canan Taştimur Temiz, Volkan Kaya. LAViT: Class-Aware Vision Transformer with Learnable Attention for Pest Recognition in Agricultural Fields. J Agr Sci-Tarim Bili. 2026 Mar. 1;32(2):474-97. doi:10.15832/ankutbd.1739676