Land Quality Index for Paddy ( Oryza sativa L.) Cultivation Area Based on Deep Learning Approach using Geographical Information System and Geostatistical Techniques

: Türkiye has ideal ecological conditions for growing rice, and its yield per hectare is often higher than the average worldwide. However, unbalanced fertilization, nutrient deficiency, and irrigation problems negatively affect paddy production when soil characteristics are not considered. The present study was conducted on a 1763-hectare field (652000-659000E-W and 4528000-4536000N-S) in 2019. This study's primary goal was to categorize land quality for rice production using 15 different physicochemical parameters and a GIS (Geographical Information Systems) and deep learning (DL) technique. Using these parameters soil types were classified and regression analysis was performed by DL. Different soil parameters as network outputs used in this study caused different performance levels in models. Therefore, different models were suggested for each network output. The R 2 values indicated a respectable level for parameter prediction, and an accuracy of 88% was attained when classifying "class" data. The findings of the study demonstrated that deep learning may be used to forecast soil metrics and distinguish between different land quality classes. Additionally, a field investigation was used to validate the indicated land quality classifications. Using statistical techniques, a substantial positive link between rice yield and land quality classes was discovered.


Introduction
Rice, a warm climate grain (Oryza sativa L.) is the main food source for 50% of the world's population (Sirat et al., 2012;Akay et al.,2017).Although rice cultivation is conducted in all geographical regions in Türkiye, 56.0% of the total rice area is in the Thrace-Marmara region, 36.5% in the Black Sea region, and 7.5% in other regions (Meral and Temizel 2006;Garris et al., 2005) Considering different scenarios derived from climate change models, food security is the most pressing issue in densely populated developing countries (Jagadish et al., 2007).A great effort has been exerted to meet the nutritional needs of the growing population in developing countries in terms of achieving consistently high yield rates (Araus and Cairns, 2014).In addition, the specification and classification of plant diseases are one of the most vital methods for early-stage intervention to increase yield (Shrivastava et al., 2019).It also ensures the ecological sustainability of soil types, which is one of the important components in both the economic and terrestrial ecosystem, as well as the use of produce considering land needs, its management, yield, and increase in quality.Therefore, important studies have been conducted on soil and land quality index approaches in recent years.The main characteristics of the alluvial land and soils which are widely classified as fluvent sub order of the Entisol order often show large variations in their features such as textural or organic matter distribution over short distances (Dengiz 2010).Identifying land quality is actually a difficult process.The reason for the complex relationship is between the physical, chemical, and biological properties of the soil and other factors.Many studies have been conducted to search the relationships between physical, chemical, and biological properties of soil types and yield (Dengiz 2013;Li et al., 2018;Dedeoğlu and Dengiz, 2019;Mwendwa et al., 2019;Rezaee et al., 2020).While it is possible to assess land quality by directly conducting land testing, several modal approaches such as comparative assessment, dynamic assessment, and land quality index (LQI) can be used indirectly.Since direct approaches are generally expensive, labor-intensive, and time-consuming, modal approaches are used more often (Dengiz 2020).Land Quality Index(LQI) approach was used in the rice land assessment.In this approach, the land quality index assessment process, which usually starts with the creation of a data set, is graded by giving score ratios according to the severity of limiting plant growth on indicators with different units.In addition, the possibility of the deep learning system, which had never been used before in rice land quality studies, was investigated in this study.
Deep learning is a modern and popular technique for image processing and data analysis with promising results and great potential.Deep learning, which has been successfully applied in various fields, has recently been used in precision agriculture applications (Kamilaris 2018).To give an example of these studies, computer-aided diagnosis (CAD) systems, using AI (Artificial Intelligence), were used in order to accurately identify diseases and pests affecting small farmers' production and also to help understand the severity of symptoms, as well as allowing any farmer with access to a smartphone to benefit from expert knowledge in a practical and cost-effective manner (Esgario et al., 2020).Azizi et al. (2020) used a convolutional neural network (CNN), a deep learning method, to classify soil clusters while they used VggNet16, ResNet50, and Inception-v4 trained models to train CNN.Esgario et al. (2020) used deep learning to classify biotic stress in coffee and to estimate its severity.The ResNet50 produced a high accuracy rate of 95.24%.Padarian et al. (2019) used deep learning to predict soil properties from regional spectral data and this study in which they used CNN showed that it could be reduced by 87% compared to predicting soil properties (PLS), a traditional method.in deeper soil layers with a high accuracy rate.
The decrease in land quality due to intensive rice cultivation threatens the sustainability of rice agriculture in the Çorum-Osmancık region of Türkiye.In the present study, we focused on identifying detailed rice land quality classes and mapping their spatial distributions in order to perform sustainable rice agriculture practices.The possibility of using the deep learning technique, accompanied by geographic information systems and geostatistics to determine the land quality classes for rice production has been investigated in this study.The identification of land quality classes has also been validated with data collected from field studies.

Study Area
The study area is located within the boundaries of Çorum-Osmancık district, in the Kızılırmak Valley of the Western Black Sea region, and between the coordinates 652000-659000E-W and 4528000-4536000N-S (WGS84, Zone 36 UTM-m).The study area covers approximately 1,763 ha and is between 399-480m above sea level (Figure 1).The study area is located in a transition area between the Black Sea and Central Anatolian climate regimes and falls into the semi-arid climate class.The physico-chemical properties of the study area were assessed in terms of the coefficient of variation (CV) which clearly indicated that the soil properties were highly variable.
The region is surrounded by Ilgaz Mountains, which extends through the east-west direction, from the west, and by its extensions and Koroglu Mountains from the south.The geological structure of these mountainous areas in the region is generally composed of Paleozoic metamorphic rocks.The wide valley bottom plains through which Kizilirmak (the Halys river) flows, make up alluvial deposits belonging to the Quaternary period.Generally, rice is grown on soil formed on these alluvial deposits.The study area is mostly flat or slightly inclined (0.0-2.0%).A total of eight soil series have been identified in the study area.Dengiz et al., (2009), defined 29 mapping units according to the digital soil map they created (Figure 2).

Soil sampling and indicator selection
In this study, a total of 246 soil samples, disturbed and undisturbed sampling from the surface (0-30 cm), with distributed soil types of Vertiso, Entisol, and Inceptisol were collected from the grid points (400 m x 400 m) created.Soil samplings were conducted especially in the autumn after the harvest, in order to avoid the effects of soil management processes such as fertilization and irrigation during the rice-growing period.Each mapping unit (land mapping units) defined with its unique soil and land properties significantly affects the suitability of the determined land utilization type to the land.Therefore, it is necessary to identify the land needs of each land utilization type for a successful and sustainable agricultural practice.
The land utilization type investigated in this study is rice.Some literature sources were examined in order to identify the land needs of rice and soil physicochemical and topographic indicators required for the model (FAO, 1983 and1985;Sys et al., 1993, Mongkolsawat et al., 2002, Bunting, 1981;Dengiz, 2013;Sezer and Dengiz, 2014;Dengiz et al., 2015;Nath et al., 2016).The development of the rice plant depends on the physical and chemical conditions of the soil type that affect the plant's root system and affects ability to grow efficiently.Therefore, Moron (2005) stated that the indicators used in soil quality identification should be sensitive enough to track changes and be easily measured and interpreted.Fourteen quality parameters fall into two different main categories for the rice land quality index model (LQIR).These are i-(Nutrient Availability Index (NAI) (including nitrogen, phosphorus, potassium, and zinc content in the soil), ii-) soil quality index (SQI) (including slope, soil depth, bulk density, clay, silt, sand, hydraulic conductivity (HC), organic matter, electrical conductivity (EC), lime-CaCO3, and soil reaction-pH (Table 1).Table 1 shows the analytical protocols used.

Land quality index and rating assignment
The rice plant likes soil that is deep, clayish, and rich in plant nutrients and organic matter, as well as being medium resistant to salt (Özkan et al., 2019).
Land quality indicators were used in the study area (Table 2).The identification of the rice land quality index consists of the nutrient availability index and soil quality index.The formula used to identify the nutrient availability index (Dengiz, 2013) It is used to identify soil quality index (SQI) (Gupta and Abrol, 1993) is shown below.
Where; Cy is clay, Si is silt, Sa is sand, D is soil depth, F is slope, P is bulk density, G is hydraulic conductivity, S is exchangeable sodium percentage (ESP), K is (CaCO3) content, and H is pH.Each indicator was scored with a ratio value between 0.2 and 1.0.The results of the analysis on the indicator take a value of 1.0 if it has the most suitable condition for rice cultivation, and 0.2 if it has the most unfavorable condition.The indicator takes a value between 0.2 and 1.0 according to the severity of limiting rice growth.The spatial information of both descriptive indicators on the NAI and descriptive indicators on the SQI were obtained from land mapping units and surface soil samples.In order to identify the land quality index value for rice, the following formula was used (Dengiz, 2013;Sezer and Dengiz, 2014).

LQIR (land quality index) = NAI × SQI
(3) The above-mentioned formula was applied to each soil sample.As a result, the higher the point value is, the higher the suitability of land is for the specified Land Utilization Type.Rice land quality classification according to Dengiz ( 2013) is given in Table 3.For the purpose of model verification, for each quality class in the study area, random blocks field trial pattern using 12 paddy varieties (Sumnu, Osmancik-97, Gonen, Beser, Duragan, Halilbey, 7721, Karadeniz, Kizilirmak, Koral, Negis, and Aromatik) was carried out for two years.In the experiment where the strewing planting method was applied, parcel yields were obtained by removing the edge effect so that the plot size was 4 x 4 = 16 m² and the harvest area was 3 x 4 = 12 m² (Sezer et al., 2017).ANOVA and LSD0.05 were performed for the grain yields.In addition to that, in order to gain values of basic descriptive statistics parameters, IBM SPSS Statistics 23v.program was used (IBM, 2015).

Deep learning and algorithms
Classification and estimation are skills that a person has learned and used multiple times throughout their life.Previously used neural networks only had one or two hidden layers; however, deep models may have a hundred layers (Goodfellow et al, 2016).These layers are used to classify pre-tagged input data or to perform numerical prediction (Kamilaris, 2018).Multiple linking between layers generates a large number of parameters.These parameters are usually initialized with random values.

Architecture of deep learning and tools
Despite the differences in deep learning architectures with their unique features, they all share the same aimwhich is to reduce the complexity of the model and increase its accuracy.(Esgario, 2020).Our model in the present study is trained on Google Colaboratory (2020), a free Jupyter notebook environment operating on the cloud.Keras (2020) backend (Python Deep Learning library) is used as a deep learning package with Tensorflow.Python 3 programming language was used to implement the deep model.In addition to many libraries required to implement deep learning algorithms, Numpy, Pandas, and MatPlotlib libraries were used.Feedforward Neural Networks (FNN), a basic deep learning method was used (Goodfellow et. al, 2016).In prepared feedforward neural network (FNN) layers, the Sequential model, which provided a flat layer stack, the most common model type in which each layer had one input tensor and one output tensor, was used (Chollet 2020).Fifteen different physicochemical properties (pH, EC, lime, OM, depth, slope, HC, BD, clay, silt, sand, N, P, K, and Zn) of soil types investigated in the study were chosen as input layer parameters in the deep learning system.The ReLU (Rectified Linear Unit) activation function, which is a widely used system in identifying the activation status of neurons in models as well as offering a computational advantage, was used in the study.RMSprop, based on gradient descent, was used as the optimization method.The learning rate was chosen as 0.001.In order to eliminate the uncertainty caused by network randomness, fixed seed data were input at the beginning of the program.

Training, Test, and Validation
In the present study, the dataset was divided into education (80%) and test (20%) sets.In addition, 20% of the training set was chosen as validation data.In deep neural networks, the learning is based on a gradient descent algorithm and back propagation approach.The cross entropy cost function was used for classification evaluation.After the cost function was calculated, the derivative of this function was assessed on weights.While performing regression, MSE (Mean squared error) loss function was used.

Performance metrics
During network training, the cases where the models provided the minimum cost function value for the validation set (weight set) were recorded.Then, these recorded models were assessed using the test dataset.In the classification study, the results were compared in terms of Confusion Matrix and Accuracy (ACC).In the regression study, results were assessed in terms of RMSE and R2.To evaluate the proposed deep learning algorithms, the accuracy metric was used as shown in equation 4: Accuracy =(TP + TN)/(TP + TN + FP + FN) (4) Where TP, TN, FP, and FN are truly positive, true negative, false positive, and false negative, respectively (Aggarwal and Agrawal, 2012).

Interpolation Analyses
Interpolation techniques are used in expressing and mapping the changeability of values on investigated properties depending on the distance (Goovaerts, 1999;Mulla and McBratney, 2000).
IDW is the most commonly used interpolation models in identifying the spatial distribution of rice land quality index (LQIR) value for each point defined within the study area.The RBF (spline) deterministic and stochastic models (also known as Kriging) models such as ordinary, universal, and simple Kriging models were also used.A total of 15 models used for forming a spatial distribution map of LQIR on the interpolation were (Inverse Distance Weighting-IDW; 1, 2, 3, Radial Basis Function-RBF; Thin Plate Spline-TPS; Completely Regularized Spline (CRS); Spline With Tension (ST), and Ordinary, Simple, and Universal Kriging models.The method that provided the lowest square-rootmean-error-value was assessed as the most suitable method.The following formula was used to calculate the square-root-mean-error.
Zi: refers to the estimated value, measured value, and the number of samples.

Soil physico-chemical characteristics
The descriptive statistics of some physico-chemical properties of soil samples are shown in Table 5. Wilding et al (1994) and Mulla and McBratney (2000) classified the variability as low if the CV is less than 15%, moderate if the CV is between 15% and 35%, and high if the CV is greater than 35%.In this sense, variables of pH had low CV.On the other hand, the variables of HC, sand, EC and AvP, AvK, AvZn, and OM content showed a high level of variability.In this study, clay, silt, sand, BD, HC, pH, EC, and CaCO3 showed normal data distribution.

Regression with deep learning (dnn) on randomly selected data, independent of soil classes
Parameters on the dataset are clearly grouped into different soil classes.During deep learning, training, and testing dataset were randomly selected without considering class information.the regression estimation of the "index" parameter using DNN is conducted (Figure 3).The R2 value of 86.07%was achieved for RM2 after 1,500 epochs on the test dataset.The R2 values on the test conducted for other network models: RM1 77.77%, and RM3 85.61%The number of network parameters in Model 1 was insufficient, the number of network parameters in RM2 was at the optimum level, and the large number of network parameters in RM3 caused overfitting.Therefore, higher estimation was achieved with the network trained using RM2.The error rate decreased as the number of epochs increased.There was not much change after approximately 250 epochs.According to results obtained in regression estimation using DNN on the "yield" parameter in Figure 4, an R2 value of 86.61% was obtained on the dataset after 1,000 epochs for RM3.The R2 values for other network models on the testRM1 81.88%, and RM2 79.61%.A high accuracy rate in estimation is obtained as the number of parameters increases in the network.Therefore, RM3 showed the highest R2 value.The error rate decreased as the number of epochs increased.After approximately 50 epochs, the training error continues to decrease; however, the validation error decreased in a slower trend.According to results obtained in regression estimation using DNN on the "NAI" parameter in Figure 5, an R2 value of 84.53% was obtained on the dataset after 1000 epochs for RM2.The R2 values for other network models on the test were obtained as RM1 81.74%, and RM3 81.07%.The number of parameters in RM2 provided the best estimation success rate for NAI.It also provided a high accuracy rate in the NAI estimation in the other two models.The error rate decreased as the number of epochs increased.There was not much change after approximately 200 epochs.

85
According to results obtained in regression estimation using DNN on the "SQI" parameter in Figure 6, an R 2 value of 87.80% was obtained on the dataset after 1500 epochs for RM3.The R 2 values for other network models on the test were shown to be 83.83% for both RM1 and RM2.Therefore, RM3 indicated the highest R 2 value.The error rate decreased as the number of epochs increased.There was not much change after approximately 200 epochs.
The study found that using the index, efficiency, NAI, and SQI soil characteristics as network outputs led to varying levels of model performance.As a result, various models were recommended for each network output.All of the R 2 values that were obtained for estimating the index, yield, NAI, and SQI parameters were within acceptable bounds.

Regression using deep learning (DNN) on randomly selected data, dependent on soil classes
During deep learning, the training and test dataset were randomly selected depending on the soil class information.The results obtained in this way are given in Table 6.In this sense, selecting samples, considering class information, yields healthier results.In Figure 7, the results obtained from 1,000 epochs are given when CM1 is used to classify the "class" information.A performance rate of 96.97% for training and 80.00% for testing was achieved.The classifying properties generated an error in Class 0. Around 56% of Class 0 samples are classified as Class 3 errors.In Figure 8, the results obtained from 1,000 epochs are given when CM2 is used to classify the "class" information.A performance rate of 95.96% for training and 88.00% for testing was achieved.The classifying properties generated an error in Class 0. Around 33% of Class 0 samples are classified as Class 3 errors.The accuracy rate obtained on the test dataset was higher in CM2.Therefore, CM2 should be used in soil classification studies.

Land quality assessment and model verification
In order to form a distribution map of LQIR values for each point identified by the deep learning system, a total of 15 semi-variogram models were applied and the model comparison obtained for RMSE values is given in Table 7.In Table 7, the lowest RMSE value was found to be 0.1095 and the Completely Regularized Spline model, belonging to the Radial Basis Function, was identified.Moreover, in Table 3, a distribution map of the LQIR map, consisting of 4 classes, was created.(Figure 9).According to results obtained in the study, it was found that 64.9% of the total land was distributed between suitable (S1) and medium-suitable (S2) classes for rice cultivation while 26.5% was in the marginal class (S3).In addition, a very small part of this land (8.6%) was found to be unsuitable for paddy cultivation.The lands that were found to be unsuitable for rice cultivation were the At.1 mapping unit, belonging to the Adatepe soil series that are classified as Vertic Calcixerept, with shallow soil depth and high slopes, Boztepe (Bz.1) classified as Vertic Haploxerept, and Bz.2 soil series.The marginal suitability class in terms of land quality is on Dağmatoğlu, Çengeldüzü, Yücekyazısı, and Kumbaba soil series which are respectively classified as Aquic Haploxerept, Vertic Xerefluvent, and Typic Haploxeret including mapping units Dc.2, Dc.3, Cd.1, Yc.3 and Kb.1 mapping units which were respectively classified as Aquic Haploxerept, Vertic Xerefluvent, and Typic Haploxeret on Dağmatoğlu, Çengeldüzü, Yücekyazısı, and Kumbaba soil series.The most important limiting feature of these soils is their salinity and coarse texture.
In order to test the model verification, a field trial study was conducted for two years in classes belonging to different rice land quality indices identified within the study area.The yield values of all rice cultivars were affected by their location.The average yield values for S1-class, S2-class, and S3class were found to be 7,197, 5,032, and 3,572 kg ha -1 respectively.The difference between S1 and S3 was found to be 3,624 kg ha -1 .The highest yield was in S1 in the class Beser and Osmancik-97 varieties with 8,166 and 8,078 kg ha -1 respectively while the lowest yield was obtained in the S3 suitability class in Kızılırmak variety with 1,505 kg ha -1 (Table 7).According to statistical analysis, the grain yields were significantly affected by LSC and it also affected varieties differently (ANOVA, P < 0.001).
The results of the LSD test are shown in Table 8.For the S1 class, the ranking of paddy varieties for decreasing grain yield was Beser > Osmancik > Halilbey > Kizilirmak > Aromatik > 7721 > Sumnu > Duragan > Karadeniz > Gonen > Koral > Negis.As for the S2 class, Kizilirmak was also observed to have the lowest grain yield for the S3 class.According to grain yield, Beser, Osmancik-97, and Halilbey were the 3 best varieties.The worst varieties are Aromatik, Kizilirmak, and Negis.According to the results, the most suitable class was determined as S1 for growing high grain yield, followed by S2 and S3 classes.

Conclusion
Considering the land quality distribution for rice, most of the land (64.9%) was found to be suitable for rice cultivation while very few (8.6%) were found to have low land quality, and unsuitable for rice cultivation.The decrease in soil quality due to intensive rice cultivation threatens the sustainability of rice agriculture in the Çorum-Osmancık region.Land quality classes, which are an important factor in agricultural production, have been prioritized in this study, and different physicochemical soil properties have been chosen as input parameters in order to conduct a regression analysis and classification using deep learning.It was found that the selection of training and test samples in the dataset, considering class information, produced high-performance results in estimating soil parameters and identifying land quality classes for rice.In addition, field trials were conducted in order to identify the accuracy levels of defined land quality classes, and results showed that the data were statistically significant according to obtained test results.

Figure 1 .
Figure 1.Soil sample pattern and location map of the study area.

Figure 2 .
Figure 2. Slope and soil map of the study area.

Figure 3 .
Figure 3. R2 and error (MAE and RMSE) graphics obtained from RM2 network used for training and test data on the "index" parameter.

Figure 4 .
Figure 4. R 2 and error (MAE and RMSE) graphics obtained from RM3 network used for training and test data on the "yield" parameter.

Figure 5 .
Figure 5. R 2 and error (MAE and RMSE) graphics obtained from RM1 network used for training and test data on the "NAI" parameter.

Figure 6 .
Figure 6.R 2 and error (MAE and RMSE) graphics obtained from RM3 network used for training and test data on "SQI" parameter.

Figure 7 .
Figure 7. Graphic of Accuracy and Confusion matrix for training and test data for CM1.

Figure. 8 .
Figure. 8. Graphic of Accuracy and Confusion matrix for training and test data for CM2.

Table 1 .
YYU J AGR SCI 33 (1): 75-90 Şenyer et al. / Land Quality Index for Paddy (Oryza sativa L.) Cultivation Area Based on Deep Learning Approach using Geographical Information System and Geostatistical Techniques 79 Analytical Protocol measurements for indicators

Table 2 .
Rating factors for indications of land quality for paddy cultivation

Table 3 .
Land quality index value for rice cultivation

Table 5 .
Descriptive statistics of some physicochemical properties of soil samples

Table 6 .
R 2 results of deep learning on randomly selected data dependent/independent of soil classes