Comparison of kriging interpolation precision between grid sampling scheme and simple random sampling scheme for precision agriculture

Received : 13.08.2015 Accepted : 18.09.2015 Sampling methods are important factors that can potentially limit the accuracy of predictions of spatial distribution patterns. A 10 ha tobacco-planted field was selected to compared the accuracy in predicting the spatial distribution of soil properties by using ordinary kriging and cross validation methods between grid sampling and simple random sampling scheme (SRS). To achieve this objective, we collected soil samples from the topsoil (0-20 cm) in March 2012. Sample numbers of grid sampling and SRS were both 115 points each. Accuracies of spatial interpolation using the two sampling schemes were then evaluated based on validation samples (36 points) and deviations of the estimates. The results suggested that soil pH and nitrate-N (NO3-N) had low variation, whereas all other soil properties exhibited medium variation. Soil pH, organic matter (OM), total nitrogen (TN), cation exchange capacity (CEC), total phosphorus (TP) and available phosphorus (AP) matched the spherical model, whereas the remaining variables fit an exponential model with both sampling methods. The interpolation error of soil pH, TP, and AP was the lowest in SRS. The errors of interpolation for OM, CEC, TN, available potassium (AK) and total potassium (TK) were the lowest for grid sampling. The interpolation precisions of the soil NO3-N showed no significant differences between the two sampling schemes. Considering our data on interpolation precision and the importance of minerals for cultivation of flue-cured tobacco, the grid-sampling scheme should be used in tobacco-planted fields to determine the spatial distribution of soil properties. The grid-sampling method can be applied in a practical and cost-effective manner to facilitate soil sampling in tobacco-planted field.


Introduction
Among nutrients necessary for plant growth, potassium, nitrogen and phosphorus are the most important and sensitive for yield and quality of flue-cured tobacco.Tobacco plants receive potassium, nitrogen and phosphorus not only from fertilizers, but also from soil minerals.However, soil is characterized by a high degree of spatial variability because of geological and soil-forming factors that operate at different intensities and scales (Goovaerts, 1998;Quine and Zhang, 2002).Therefore, uniform fertilizer application likely leads to over-application in areas with high nutrient levels and under-application in areas with low nutrient levels (Ferguson et al., 2002), which results in extremely uneven tobacco yield and quality.Variablerate fertilizer application, which is an efficient method of solving such problems, is possible if spatial variation in nutrient contents across a field is known (Cahn et al., 1994).The spatial distribution of soil properties can be predicted accurately by soil spatial sampling (Zhu et al., 2005).In soil sampling, point information is used to estimate soil fertility levels at locations where samples are not obtained.Soil sampling schemes are important because of their effect on the quality of spatial interpolation maps, and the corresponding values to manage such maps depend on the accuracy of predicted values (Frogbrook, 1999).Therefore, a suitable sampling method is necessary to estimate correctly the values of soil properties in areas of specified sizes.
An optimal and time-effective sampling method in any study should provide maximum estimation precision with the least sample cost.The number of soil samples and sampling strategies required to represent field variability has been extensively studied since the 1920s (Lindsley and Bauer, 1929).This requirement is possibly based on economic values (Peck, 1990).Increased agricultural income from reduced input, increased yield, and improved quality should offset not only the costs of characterizing soil variability but also the technology required for variable application.Studies have indicated that appropriate sampling or a combination of systematic is effective when the density of rare or clustered populations is estimated (Thompson and Seber, 1996;Christman, 2000).The sample pattern and sample spacing are important factors that affect the accuracy of the kriging interpolation (Wollenhaupt et al., 1994;Gotway et al., 1996), which is a technology for estimating the soil property values in nonsampled areas and generating the spatial distribution (Goovaerts, 2000).Few previous studies evaluated the effect of sample design and intensity on the accuracy of assessment (McBratney et al., 1999;Caeiro et al., 2002;Zhu and Stein, 2006;Brus et al., 2006;Li et al., 2007;Kumar,2009;Liu et al., 2010).That research provides a starting point in developing a sampling strategy for analyzing effectively the spatial variability of soil properties in tobacco-planted field.
Furthermore, sampling strategies should be carefully selected to represent fields in which known sources of variability are considered.Field observation has been traditionally based on discrete sampling procedures by using either grid sampling or SRS (Li et al., 2007).A grid pattern is commonly an optimum sampling scheme to ensure that an entire field is represented.However, sampling with a grid pattern can be laborious if large areas are examined (Li et al., 2007).Therefore, the present study compared kriging interpolation precision between the grid sampling method and the SRS.
There are more than one million ha of agricultural soil for flue-cured tobacco plantation that serve as an important source of income for tobacco growers as well as local administration in China.In order to improve the quality of flue-cured tobacco, the Tobacco Company of China has requested that farmers adopt variablerate fertilizer application.The objectives of this study were: (i) to quantify the spatial variability of soil properties across flue-cured tobacco (Nicotiana tabacum) plantation fields, (ii) to identify the appropriate sampling scheme for nine soil properties in order to minimize cost and maximize evaluation accuracy, and (iii) to provide a theoretical basis for establishing a reasonable sampling scheme in tobacco-planting fields.

Study site
The study was performed at an agricultural experiment station in Southeast Pengshui County (29°8'14'4672"N, 107°57'3081"E) of Chongqing City in Southwest China in 2012.The site is characterized by a subtropical moist monsoon climate with an average annual temperature of 17.5 °C, a mean annual potential evapotranspiration of 950.4 mm, and an annual precipitation of approximately 1104.2 mm.The main soil type at this site ranges from light clay (80.6%) to heavy loam (3.5%), and the remaining 15.9% is medium clay.In addition, the soil is slightly acidic (pH = 5.87).

Soil sampling and laboratory analysis
A 10-hectare area was selected for soil sampling, and the overview of the boundary of the study site is illustrated in Figure 1.This area is surrounded by a hill and the field was bare at the time of observation.Soil samples were collected in March 4, 2012.All of the samples were taken from the topsoil (0-20 cm), and a real-time kinematic global positioning system (GPS) survey was used to identify sampling locations.At each point, the values were recorded by differential GPS (DGPS) and then converted to coordinates (x, y; Fig. 1).The soil samples of the grid sampling scheme were collected using an approximately 32 m grid sampling design (n=115, Fig. 1).The soil samples of the regular simple random sampling scheme were regularly taken from spaces that ranged between approximately 16 m and 32 m (n=115, Fig. 1).Soil samples were packed in plastic bags, air-dried, divided, and ground to a sufficient size to pass through a 2 mm sieve before analysis was conducted.The soil pH of the samples was subsequently measured using a pH meter with a glass electrode (soil-H2O ratio = 1:2.5,W-V).The organic matter (OM) content was analyzed using the wet oxidation method of Walkley and Black (Nelson and Sommers, 1982).Cation exchange capacity (CEC) was determined by extraction using neutral sodium acetate (Chapman, 1965).Total nitrogen (TN) was determined using the Kjeldahl method (Bremner and Mulvaney, 1984).Nitrate nitrogen (NO3 _ N) of a fresh sample was determined using a continuous flowing analyzer (Paramasivam et.al., 2002).Total phosphorus (TP) was then determined by sulfate-perchlorate acid heating digestion-MoSb colorimetry (Lu, 1999).Available phosphorus (AP) was determined using the Olsen extraction method with alkaline sodium bicarbonate as an extractant at a ratio of 20:1 (Olsen et al., 1954).Total potassium (TK) was analyzed by inductively coupled plasma-atomic emission spectroscopy.Available potassium (AK) was determined using the neutral ammonium acetate method (Richards, 1954).

Evaluation method
In order to evaluate the accuracy of our estimates, 36 sites were selected by probability sampling for external validation.The interpolation values of these 36 points were compared with the actual measurements.
As an alternative method of evaluating the accuracy of our estimates, we determined the performance of each interpolation obtained using the two sampling methods.Deviations of the estimates from the measured data were then compared by cross-validation (Webster and Oliver, 2001).Comparison of performance of the two sampling methods was done using the following statistics: In order to evaluate the accuracy of the estimates, the performance of each interpolation under different intervals was assessed by comparing the deviation of estimates from the measured data through crossvalidation (Webster and Oliver, 2001).Comparison of performance between the two sampling schemes was done using the following statistics: mean absolute error (MAE), mean error (ME), mean square error (MSE), average standardized error (ASE), root mean square error (RMSE), and root mean square standardized error (RMSSE).The ME was used to determine the degree of bias in the estimates; MSE provided a measure of the size of MSE; ASE was used to identify the degree of ASE; RMSE provided a measure of the error size that it is sensitive to outliers; and RMSSE was used to determine the degree of RMSSE.The five error statistics of predictions were used in the cross-validation analysis.The equations are as follows (Johnston et al., 2001): where ˆ() i Zx is the predicted value, ()  The measured variables in the data set were analyzed using descriptive statistical methods to obtain the mean, S.D., minimum, median, maximum, skewness, kurtosis, and coefficient of variation (CV) using SPSS 17.0 software.Distributions of these variables were evaluated to determine normality using Kolmogorov-Smirnov statistics.Semi-variance calculation, semi-variogram function model fitting, and cross validation were performed using the geostastistical software ArcGIS 9.3 for Windows.

Descriptive statistics
Summary statistics of soil parameters is shown in Table 1.Distributions of all variables were only slightly skewed (skewness < 1), and their median values were close to their mean values.In addition, results showed that all data were normally distributed (Kolmogorov-Smirnov test, P > 0.05; Table 1).The greatest variability was observed in AK (32.92%), whereas the least variability was found in pH (7.67%).The variability in pH and NO3-N were low (CV < 15%), whereas all of the other soil variables exhibited moderate variation (CV: 15%-50%).AK also exhibits a localized pattern of enrichment according to the National Soil Survey Office (1993).In conclusion, the soil possesses high variability, which results in extremely uneven tobacco yield, quality, and agricultural benefits, which suggests that variable-rate fertilization application is necessary for improving tobacco quality in this area.

Geostatistical analysis
Semi-variograms were calculated to identify the possible spatial structure of different soil variables.Crossvalidation was performed to compare the prediction performances of geostatistical interpolation algorithms with a particular sampling method.Cross-validation indicators and additional model parameters (nugget, sill, and range) helped determine the optimal model of the prediction maps for each soil property (Issaks and Srivastava, 1989).In this study, the optimal model that could describe specific spatial structures was identified.The result of the geostatistical analysis is shown in Table 2.In this analysis, several spatial distribution models were fitted, which revealed different spatial dependence levels of the soil variables based on the cross-validation model.
Our results suggested that NO3-N, TK, and AK ideally matched an exponential model, whereas the remaining variables suitably fit the spherical model with the two sampling methods.The coefficient of determination (R 2 ) of all variables, except for NO3-N and AK, was greater than 0.90, indicating good fits.The spatial variability of soil nutrients may be affected by extrinsic (soil management practices such as fertilization and cultivation) and intrinsic factors (soil formation factors, such as soil parent materials).The strong spatial dependence of soil variables can typically be attributed to extrinsic factors (Cambardella et al., 1994).The value nugget % was used to qualitatively define spatial dependence values.Our results showed that pH, OM, CEC, TN, and NO3-N exhibited <25% except NO3-N in the SRS, which suggested strong spatial dependence.Rodríguez et al. (2009) found that spatial dependence of NO3-N was high in sandy and loam soils.All of the other soil variables were moderately spatially dependent with nugget % ranging from 33.33% to 57.14%.This range could be interpreted as the zone of influence diameter.This result represented the average maximum distance at which the soil properties of the two sampling points are related.The ranges of spatial dependences exhibited large variation (from 95.78 m for OM of grid sampling to 558.41 m for AK of SRS, as shown in table 2).

Accuracy of validation samples
The spatial distributions of nine soil properties were completed with the optimal models using the two sampling schemes in the study area.The predicted values of evaluation samples (115 points) and validation samples (36 points) were obtained using the two sampling schemes.
The scatter diagrams, in which the predicted values are on the vertical axis and the measured values are on the horizontal axis, were constructed and a trend line was subsequently added (Fig. 2).The trend line was closer to the 45-degree line, indicating a more accurate prediction than other lines (Li et al., 2008).In the scatter diagram, the prediction accuracies of pH, TP, and AP in the SRS were bigger than that of the gridsampling scheme, while the prediction accuracy of soil OM, CEC, TN, NO3-N, TK and AK in the grid-sampling scheme was bigger than that of the SRS.

Analysis of interpolation results for SRS and Grid sampling
Results of cross-validation from the 115 points arranged in the SRS are shown in Table 3.These data may be more useful in comparing prediction errors.ME was always nearly equal to MSE, ASE was almost equal to RMSE, and RMSSE values were always close to 1. Consequently, kriging was shown to be accurate for soil pH, TP and AP by the five statistical criteria in the SRS, while the interpolation results of soil OM, CEC, TN, TK and AK in the SRS were poor than that in the grid-sampling scheme.The difference in accuracy of the soil NO3-N between the two sampling schemes was not markedly significant.
For pH, TP, and AP, the absolute values of ME and MSE were slightly greater than zero, whereas the absolute value of RMSE was remarkably close to and less than that of ASE, and the RMSSE values were less than one, indicating that they were underestimated in the SRS.The absolute values of ME, RMSE, ASE, and MSE of TN and AK were very close to zero, and RMSE > ASE, RMSSE > 1, suggesting that they were slightly overestimated at the grid sampling scheme.However, it was underestimated by kriging for OM and TK in the grid sampling scheme (RMSE < ASE, RMSSE < 1).Thus, the results of cross-validation of soil OM, CEC, TN, TK and AK indicate that the interpolation results of the grid-sampling scheme were better than the regular simple random sampling scheme, and soil pH, TP and AP provided worse interpolation results in grid-sampling scheme.The interpolation results of soil NO3-N have no markedly significant in the two sampling schemes.

Interpolation Maps
Kriging interpolation was performed to obtain a filled contour map using ArcGIS 9.3 with the two sampling methods to determine the spatial distribution and status of soil properties.The contour maps of the soil properties are shown in Figure 3.For each of the soil properties, the spatial distribution trend was similar using the two sampling schemes.Analysis of distributions of soil nutrients revealed that TP and AP have similar spatial distributions.High levels of TP and AP were detected at the east section and low distribution was observed in the southwest sections of the study area.The contour map of TK content showed the highest positional similarity with the krigged AK map.The distributions of pH, OM, and CEC were consistent with high values in the middle section of the field.

Discussion
Coefficients of variation (CVs) were very different among the variables, ranging from approximately 7.67% (pH) to nearly 32.92% (AP), indicating large heterogeneity in soil properties.Nitrogen (N), phosphorus (P) and potassium (K), which are the major minerals needed for tobacco growth and development.In addition, they had particularly high CVs in this study area.This is consistent with the findings of Wang et al. (2009).
Our results differed from those reported by Gupta et al. (1999), who showed that soil NO3-N exhibits moderate variation.This is mainly because of different planting patterns, uneven crop growth and nonuniform management practices, resulting in marked changes in topsoil over small distances.Therefore, sitespecific management of nutrients may be necessary to achieve maximum economic and environmental benefits (Jiang et al., 2010).However, classical statistics do not show the spatial distribution of soil properties.The spatial distribution maps of soil nutrients based on geostatistical analysis provide a basis for variable-rate application.
It is important to determine the spatial dependence of soil properties because soil properties with strong spatial dependence are more readily managed (Jiang et al., 2010).The nugget-sill ratio can be used to classify the spatial dependence of soil properties (Jiang et al., 2011).The lower ratios for pH, OM, CEC, and TN were less than 25, indicating that the four soil variables had strong spatial dependence and the structural factors strongly influenced the spatial variability of these properties (Cambardella et al., 1994).Consistent with other reports, the spatial dependence of pH, TN, OM, and CEC were fairly strong (Jiang et al., 2010;Liu et al., 2008).Meanwhile, all other soil variables, except for NO3-N of grid sampling, were moderately spatially variable, with the nugget/sill ratios ranging from 33.33% to 42.60%.Ranges of spatial dependence that varied from 95.78 m for OM of grid sampling to 558.41 m for AK of SRS were larger than the smallest sampling distance (16 m), suggesting that the two sampling schemes can satisfy the requirement of spatial variability structure analysis of soil properties in the study.Knowledge of the range of influence for various soil variables allows for construction of independent data sets that can be used for classical statistical analysis.A small range indicated that a small sampling interval is required for OM.For the same soil properties, the nugget-sill ratios of pH, CEC and NO3-N of grid sampling were lower than those of SRS (Table 2).The other soil properties were just the opposite.Results of cross-validation of the soil properties indicated that a reasonable sampling scheme of pH, CEC and NO3-N at the site was SRS, whereas that for the remaining soil properties was grid sampling.Therefore, we can speculate that the sampling scheme with bigger nugget-sill ratio was optimal within a certain range of nugget-sill ratio.However, whether this is applicable to other soil properties or other fields must be verified further.
Variable-rate fertilizer application is only possible if the spatial variation in nutrient status across a field is known (Cahn et al., 1994).Geostatistics is an important tool for characterizing the spatial variability of soil properties, and it has been widely used in variable-rate fertilizer application.The accuracy of kriging interpolation also depends on the sample pattern and sample spacing (Gotway et al., 1996).Sampling design is also very important when the objective is to interpolate in an optimal fashion and to generate spatial distribution maps for soil properties within a region (Haining, 1990).Results of the present study revealed that the interpolation errors of soil OM, CEC, TN, TK and AK were the lowest with the grid sampling scheme.
On the other hand, interpolation error of soil pH, TP, and AP was the lowest with the SRS.The interpolation results of soil TN, CEC, AK, and TK obtained by grid sampling were better than SRS, while the interpolation accuracy of soil NO3-N was no markedly significant in the two sampling scheme.Our results differ from the findings reported by Mallarino and Wittry (2004), which found that grid sampling was the most effective for phosphorus, potassium, pH, and organic matter.Discrepancy between these results and those reported in previous studies may be attributed to inconsistencies in agricultural experimental treatments among different studies.
Potassium is one of the major minerals needed for tobacco growth and development (Liu, 2003).The potassium content of tobacco leaves is highly correlated to tobacco leaf quality and cigarette safety, and potassium content is an important index for measuring tobacco leaf quality (Yang et al., 2007).Potassium deficiency is the most common problem in tobacco planting fields in China, and is seen as a major constraint for improving tobacco quality.In this study, we found that the interpolation errors of soil potassium data in the topsoil layer were lowest in the grid sampling scheme.Similarly, It should be noted that the gridsampling scheme has previously been shown to be more effective for potassium (Mallarino and Wittry, 2004).Thus, it can be deduced that the optimal spatial sampling scheme is grid sampling if the land was previously planted with flue-cured tobacco.Additionally, plants derive nitrogen and phosphorus, which are important for tobacco plant growth and development, not only from fertilizers, but also from soil minerals.
Our results clearly showed that the prediction accuracies of soil OM, CEC, TN, TK, and AK were smaller in grid-sampling scheme than SRS.The pH, TP, and AP had the smallest kriging errors in the SRS, the interpolation errors of NO3-N were not markedly different between the two sampling schemes.We instead found that the grid sampling method is optimal for tobacco planting fields because soil properties, which are important for tobacco quality, are predicted with better accuracy, and an accurate site-specific fertilization scheme for precision farming can be more easily developed using this method.
This study provides a theoretical basis for the practice of precision agriculture in tobacco-planted field.For tobacco production, examining the distribution and abundance and shortage situation of key-nutrients is necessary, especially for the nutrient crucial for tobacco such as potassium (Liu, 2003).If we continue to use the experience based homogenization fertilization management, the variability of tobacco yield and quality will increase.Therefore, the key-nutrient must be closely considered by the producer when determining the sampling scheme.From the analytical results of this study, the interpolation error of AK and TK were the smallest, revealing that the best sampling scheme was grid sampling.In actual production, the grid-sampling scheme could be used as the basis for site-specific fertilizer management because potassium, which is keynutrient for tobacco quality, is predicted with better accuracy.In the future, a reasonable sampling scheme should be selected in accordance with spatial autocorrelation, trend effect, anisotropy effect of soil properties, and the restrictions of monetary, time, workforce.

Conclusion
Classical statistical analysis showed considerable spatial variability of all soil properties.Soil pH and NO3-N had low variation (CV, <15%), whereas all other soil properties exhibited medium variation (CV, 15%-15%).Geostatistical analysis of soil properties indicated that pH, OM, TN, CEC, TP and AP ideally matched the spherical model, whereas the remaining variables suitably fit the exponential model with two sampling methods.Classical statistical analysis and geostatistical analysis of soil properties revealed considerable spatial variability in the study area, suggesting that variable-rate fertilizer application is required.In addition, interpolation error analysis revealed that soil OM, CEC, TN, TK and AK had the smallest kriging errors in the grid sampling scheme.The smallest interpolation error for pH, TP, AP was obtained by SRS.Moreover, the interpolation precisions of NO3-N were not markedly different between the two sampling schemes, while the SRS was slightly better for soil pH, TP, AP, and grid sampling was slightly more appropriate for all other soil properties.Therefore, considering the fact that potassium is one of the major minerals needed for tobacco, the grid sampling scheme should be used for variable-rate fertilizer application to improve the quality of flue-cured tobacco.

Figure 1 .
Figure 1.Soil sample distribution under two sampling schemes (a: grid sampling scheme, b: simple random sampling scheme) in the 10-ha area.

Figure 3 .
Figure 3. Smoothed contour maps produced by kriging for pH, OM, CEC, TN, TP, TK, NO3-N, AP, AK under two sampling methods in the 10-ha area (a: grid sampling scheme, b: simple random sampling scheme).
＃ CV, coefficient of variation (%).※ K-S Test, Kolmogorov-Smirnov test was used to test the significance level of normality, all variables were normally distributed (P > 0.05).

Table 3 .
Results of cross-validation (ME, MSE, ASE, RMSE and RMSSE) from the 115 points, respectively, in the simple random sampling scheme and grid sampling scheme