TY - JOUR T1 - Lightweight Transformer Model for Agricultural Land Use and Land Cover Classification AU - Çelik, Kemal PY - 2025 DA - September Y2 - 2025 DO - 10.15832/ankutbd.1624812 JF - Journal of Agricultural Sciences JO - J Agr Sci-Tarim Bili PB - Ankara University WT - DergiPark SN - 1300-7580 SP - 941 EP - 959 VL - 31 IS - 4 LA - en AB - Observing agricultural land use via remote sensing images is essential for ensuring food security, estimating yields and planning efficient exports nonetheless precise classification continues to be difficult because of the varied and evolving characteristics of agricultural environments. This research aims to evaluate and optimize advanced deep learning architectures particularly Vision Transformer (ViT) models for agricultural land-use classification tasks. Specifically, we employed ViTBase-16 and other lightweight models DeiT-Tiny and EfficientNet-B0 applying techniques such as model layer compression and advanced data augmentation CutMix and Cutout to achieve high accuracy while significantly reducing computational complexity. Evaluation was performed using three benchmark remote sensing datasets EuroSAT, NWPU-RESISC45 and SIRI-WHU which include diverse spatial resolutions and agricultural classes relevant for practical monitoring. Findings indicate that the optimized ViT algorithm is highly effective in recognizing global spatial connections, consistently achieving remarkable classification accuracy exceeding 99% on a newly assembled dataset containing around 200 samples of Google Earth imagery. Furthermore, the first time in agricultural image classification compressing the ViTBase model by pruning 50% of its layers significantly reduced complexity maintainingcompetitive accuracy 97.9% on SIRI-WHU. The resulting models are particularly suitable for deployment on devices with limited computational resources supporting real-world operational agricultural monitoring systems. This study emphasizes the revolutionary possibilities and practical use of optimized transformer-based models that offer scalable and efficient solutions specifically designed for precision agriculture applications. KW - Agriculture KW - Data augmentation KW - Deep learning KW - Vision transformers KW - Land use CR - Ahmad M, Shabbir S, Roy S K, Hong D, Wu X, Yao J & Chanussot J (2022). Hyperspectral Image Classification - Traditional to Deep Models: A Survey for Future Prospects. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15: 968–999. doi: 10.1109/JSTARS.2021.3133021 CR - Albarakati H M, Khan M A, Hamza A, Khan F, Kraiem N, Jamel L & Alroobaea R (2024). A Novel Deep Learning Architecture for Agriculture Land Cover and Land Use Classification from Remote Sensing Images Based on Network-Level Fusion of Self-Attention Architecture. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 6338–6353. doi: 10.1109/JSTARS.2024.3369950 CR - Albert A, Kaur J & Gonzalez M C (2017). Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F129685, 1357 1366. Association for Computing Machinery. doi: 10.1145/3097983.3098070 CR - Alzahrani M S & Alsaade F W (2023). Transform and Deep Learning Algorithms for the Early Detection and Recognition of Tomato Leaf Disease. Agronomy 13(5). doi: 10.3390/agronomy13051184 CR - Attri I, Awasthi L K, Sharma T P & Rathee P (2023). A review of deep learning techniques used in agriculture. Ecological Informatics, Vol. 77. Elsevier B.V. doi: 10.1016/j.ecoinf.2023.102217 CR - Basu S, Ganguly S, Mukhopadhyay S, DiBiano R, Karki M & Nemani R (2015). DeepSat - A learning framework for satellite imagery. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, 03-06-November-2015. Association for Computing Machinery. doi: 10.1145/2820783.2820816 CR - Bazi Y, Bashmal L, Al Rahhal M M, Dayil R Al & Ajlan N Al (2021). Vision transformers for remote sensing image classification. Remote Sensing, 13(3): 1–20. doi: 10.3390/rs13030516 CR - Beyer L, Zhai X & Kolesnikov A (2022). Better plain ViT baselines for ImageNet-1k. Retrieved from http://arxiv.org/abs/2205.01580 CR - Bowles C, Chen L, Guerrero R, Bentley P, Gunn R, Hammers A & Rueckert D (2018). GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks. Retrieved from http://arxiv.org/abs/1810.10863 CR - Celik F, Balik Sanli F & Bozic D (2024). Transformer Networks to Classify Weeds and Crops in High-Resolution Aerial Images From North East Serbia. Turkish Journal of Field Crops 29(2): 112–120. doi: 10.17557/tjfc.1511404 CR - Chen W, Du X, Yang F, Beyer L, Zhai X, Lin T.-Y.& Zhou D (2021). A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation. Retrieved from http://arxiv.org/abs/2112.09747 CR - Cheng G, Han J & Lu X (2017, October 1). Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proceedings of the IEEE, Vol. 105, pp. 1865–1883. Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/JPROC.2017.2675998 CR - Cubuk E D, Zoph B, Mane D, Vasudevan V & Le Q V (2019). Autoaugment: Learning augmentation strategies from data. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 113–123. IEEE Computer Society. doi: 10.1109/CVPR.2019.00020 CR - DeVries T & Taylor G W (2017). Improved Regularization of Convolutional Neural Networks with Cutout. Retrieved from http://arxiv.org/abs/1708.04552 CR - Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T & Houlsby N (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Retrieved from http://arxiv.org/abs/2010.11929 CR - Gandhar A, Gupta K, Pandey A K & Raj D (2024, June 1). Fraud Detection Using Machine Learning and Deep Learning. SN Computer Science, Vol. 5. Springer. doi: 10.1007/s42979-024-02772-x CR - Gong B, Dai K, Shao J, Jing L & Chen Y (2023). Fish-TViT: A novel fish species classification method in multi water areas based on transfer learning and vision transformer. Heliyon 9(6). doi: 10.1016/j.heliyon.2023.e16761 CR - Guo N, Jiang M, Gao L, Li K, Zheng F, Chen X & Wang M (2023). HFCC-Net: A Dual-Branch Hybrid Framework of CNN and CapsNet for Land-Use Scene Classification. Remote Sensing, 15(20). doi: 10.3390/rs15205044 CR - Hamza A, Khan M A, Ur Rehman S, Albarakati H M, Alroobaea R, Baqasah A M & Masood A (2023). An Integrated Parallel Inner Deep Learning Models Information Fusion with Bayesian Optimization for Land Scene Classification in Satellite Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16, 9888–9903. doi: 10.1109/JSTARS.2023.3324494 CR - Hamza A, Khan M A, Ur Rehman S, Al-Khalidi M, Alzahrani A I, Alalwan N & Masood A (2024). A Novel Bottleneck Residual and SelfAttention Fusion-Assisted Architecture for Land Use Recognition in Remote Sensing Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17: 2995–3009. doi: 10.1109/JSTARS.2023.3348874 CR - Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z & Tao D (2020). A Survey on Visual Transformer. doi: 10.1109/TPAMI.2022.3152247 CR - He K, Zhang X, Ren S & Sun J (2016a). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. IEEE Computer Society. doi: 10.1109/CVPR.2016.90 CR - He K, Zhang X, Ren S & Sun J (2016b). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. IEEE Computer Society. doi: 10.1109/CVPR.2016.90 CR - Helber P, Bischke B, Dengel A & Borth D (2019). Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7): 2217–2226. doi: 10.1109/JSTARS.2019.2918242 CR - Hinton G, Vinyals O & Dean J (2015). Distilling the Knowledge in a Neural Network. Retrieved from http://arxiv.org/abs/1503.02531 CR - Huete A, Didan K, Miura T, Rodriguez E P, Gao X & Ferreira L G (2002). Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Retrieved from www.elsevier.com/locate/rse CR - Jackson P T, Atapour-Abarghouei A, Bonner S, Breckon T & Obara B (2018). Style Augmentation: Data Augmentation via Style Randomization. Retrieved from http://arxiv.org/abs/1809.05375 CR - Joshi A, Pradhan B, Gite S & Chakraborty S (2023, April 1). Remote-Sensing Data and Deep-Learning Techniques in Crop Mapping and Yield Prediction: A Systematic Review. Remote Sensing, Vol. 15. MDPI. doi: 10.3390/rs15082014 CR - Khan S D & Basalamah S (2023). Multi-Branch Deep Learning Framework for Land Scene Classification in Satellite Imagery. Remote Sensing, 15(13). doi: 10.3390/rs15133408 CR - Khan S, Naseer M, Hayat M, Zamir S W, Khan F S & Shah M (2021). Transformers in Vision: A Survey. doi: 10.1145/3505244 CR - Li X & Li S (2022). Transformer Help CNN See Better: A Lightweight Hybrid Apple Disease Identification Model Based on Transformers. Agriculture (Switzerland), 12(6). doi: 10.3390/agriculture12060884 CR - Martins V S, Kaleita A L, Gelder B K, da Silveira H L F & Abe C A (2020). Exploring multiscale object-based convolutional neural network (multi-OCNN) for remote sensing image classification at high spatial resolution. ISPRS Journal of Photogrammetry and Remote Sensing, 168: 56–73. doi: 10.1016/j.isprsjprs.2020.08.004 CR - Maurício J, Domingues I & Bernardino J (2023, May 1). Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review. Applied Sciences (Switzerland), Vol. 13. MDPI. doi: 10.3390/app13095521 CR - Miotto R, Wang F, Wang S, Jiang X & Dudley J T (2017). Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics, 19(6): 1236–1246. doi: 10.1093/bib/bbx044 CR - Nguyen T T, Hoang T D, Pham M T, Vu T T, Nguyen T H, Huynh Q T & Jo J (2020). Monitoring agriculture areas with satellite images and deep learning. Applied Soft Computing Journal, 95. doi: 10.1016/j.asoc.2020.106565 CR - Nie J, Yuan Y, Li Y, Wang H, Li J, Wang Y & Ercisli S (2024). Few-shot Learning in Intelligent Agriculture: A Review of Methods and Applications. Tarim Bilimleri Dergisi 30(2): 216–228. doi: 10.15832/ankutbd.1339516 CR - Papageorgiou E I, Markinos A T & Gemtos T A (2011). Fuzzy cognitive map based approach for predicting yield in cotton crop production as a basis for decision support system in precision agriculture application. Applied Soft Computing Journal, 11(4): 3643–3657. doi: 10.1016/j.asoc.2011.01.036 CR - Park S, Im J, Park S, Yoo C, Han H & Rhee J (2018). Classification and mapping of paddy rice by combining Landsat and SAR time series data. Remote Sensing, 10(3). doi: 10.3390/rs10030447 CR - Ranftl R, Bochkovskiy A & Koltun V (2021). Vision Transformers for Dense Prediction. Retrieved from https://github.com/intel-isl/DPT. CR - Reedha R, Dericquebourg E, Canals R & Hafiane A (2022). Transformer Neural Network for Weed and Crop Classification of High Resolution UAV Images. Remote Sensing, 14(3). doi: 10.3390/rs14030592 CR - Sandler M, Howard A, Zhu M, Zhmoginov A & Chen L C (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4510–4520. IEEE Computer Society. doi: 10.1109/CVPR.2018.00474 CR - Shastry K A, Sanjay H A & Deexith G (2017). Quadratic-radial-basis-function-kernel for classifying multi-class agricultural datasets with continuous attributes. Applied Soft Computing Journal, 58: 65–74. doi: 10.1016/j.asoc.2017.04.049 CR - Shin H, Jeon S, Seol Y, Kim S & Kang D (2023). Vision Transformer Approach for Classification of Alzheimer’s Disease Using 18F Florbetaben Brain Images. Applied Sciences (Switzerland), 13(6). doi: 10.3390/app13063453 CR - Singh G, Singh S, Sethi G & Sood V (2022). Deep Learning in the Mapping of Agricultural Land Use Using Sentinel-2 Satellite Data. Geographies, 2(4): 691–700. doi: 10.3390/geographies2040042 CR - Slavkovikj V, Verstockt S, De Neve W, Van Hoecke S & Van De Walle R (2015). Hyperspectral image classification with convolutional neural networks. MM 2015 - Proceedings of the 2015 ACM Multimedia Conference, 1159–1162. Association for Computing Machinery, Inc. doi: 10.1145/2733373.2806306 CR - Suh H K, IJsselmuiden J, Hofstee J W & van Henten E J (2018). Transfer learning for the classification of sugar beet and volunteer potato under field conditions. Biosystems Engineering, 174: 50–65. doi: 10.1016/j.biosystemseng.2018.06.017 CR - Suravarapu V K & Patil H Y (2023). Person Identification and Gender Classification Based on Vision Transformers for Periocular Images. Applied Sciences (Switzerland), 13(5). doi: 10.3390/app13053116 CR - Szegedy C, Vanhoucke V, Ioffe S, Shlens J & Wojna Z (2016). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826. IEEE Computer Society. doi: 10.1109/CVPR.2016.308 CR - Tan M & Le Q V (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Retrieved from http://arxiv.org/abs/1905.11946 CR - Tang X, Zhang X, Liu F & Jiao L (2018). Unsupervised deep feature learning for remote sensing image retrieval. Remote Sensing, 10(8). doi: 10.3390/rs10081243 CR - Temenos A, Temenos N, Kaselimi M, Doulamis A & Doulamis N (2023). Interpretable Deep Learning Framework for Land Use and Land Cover Classification in Remote Sensing Using SHAP. IEEE Geoscience and Remote Sensing Letters, 20. doi: 10.1109/LGRS.2023.3251652 CR - Thakur P S, Chaturvedi S, Khanna P, Sheorey T & Ojha A (2023). Vision transformer meets convolutional neural network for plant disease classification. Ecological Informatics, 77. doi: 10.1016/j.ecoinf.2023.102245 CR - Thirumaladevi S, Veera Swamy K & Sailaja M (2023a). Remote sensing image scene classification by transfer learning to augment the accuracy. Measurement: Sensors, 25. doi: 10.1016/j.measen.2022.100645 CR - Thirumaladevi S, Veera Swamy K & Sailaja M (2023b). Remote sensing image scene classification by transfer learning to augment the accuracy. Measurement: Sensors, 25. doi: 10.1016/j.measen.2022.100645 CR - Vaswani A, Brain G, Shazeer N, Parmar N, Uszkoreit J, Jones L & Polosukhin I (2017). Attention Is All You Need. Vohra R & Tiwari K C (2023). Land cover classification using multi-fusion based dense transpose convolution in fully convolutional network with feature alignment for remote sensing images. Earth Science Informatics, 16(1): 983–1003. doi: 10.1007/s12145-022-00891-8 CR - Xia Z, Pan X, Song S, Erran Li, L & Huang G (2022). Vision Transformer with Deformable Attention. Retrieved from https://github.com/LeapLabTHU/DAT. CR - Yun S, Han D, Oh S J, Chun S, Choe J & Yoo Y (2019). CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. Retrieved from http://arxiv.org/abs/1905.04899 CR - Zahra U, Khan M A, Alhaisoni M, Alasiry A, Marzougui M & Masood A (2024). An Integrated Framework of Two-Stream Deep Learning Models Optimal Information Fusion for Fruits Disease Recognition. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17: 3038–3052. doi: 10.1109/JSTARS.2023.3339297 CR - Zhai X, Kolesnikov A, Houlsby N & Beyer L (2021). Scaling Vision Transformers. Retrieved from http://arxiv.org/abs/2106.04560 CR - Zhang H, Cisse M, Dauphin Y N & Lopez-Paz D (2017). mixup: Beyond Empirical Risk Minimization. Retrieved from http://arxiv.org/abs/1710.09412 CR - Zhang M, Lin H, Wang G, Sun H & Fu J (2018). Mapping paddy rice using a Convolutional Neural Network (CNN) with Landsat 8 datasets in the Dongting Lake Area, China. Remote Sensing, 10(11). doi: 10.3390/rs10111840 CR - Zhang X, Zhou X, Lin M & Sun J (2018). ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 6848–6856. IEEE Computer Society. doi: 10.1109/CVPR.2018.00716 CR - Zhao B, Zhong Y, Xia G S & Zhang L (2016). Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 54(4), 2108–2123. doi: 10.1109/TGRS.2015.2496185 CR - Zhao S, Tu K, Ye S, Tang H, Hu Y & Xie C (2023, November 3). Land Use and Land Cover Classification Meets Deep Learning: A Review. Sensors (Basel, Switzerland), Vol. 23. doi: 10.3390/s23218966 CR - Zuo S, Xiao Y, Chang X & Wang X (2022). Vision transformers for dense prediction: A survey. Knowledge-Based Systems, 253. doi: 10.1016/j.knosys.2022.109552 UR - https://doi.org/10.15832/ankutbd.1624812 L1 - https://dergipark.org.tr/en/download/article-file/4542577 ER -