Prediction of soil organic carbon using VIS-NIR spectroscopy: Application to Red Mediterranean soils from Croatia

Received : 09.03.2017 Accepted : 24.05.2017 The objectives of this research were: (i) to assess the accuracy of diffuse reflectance spectroscopy (DRS) in predicting the soil organic carbon (SOC) content, and (ii) determine the importance of wavelength ranges and specific wavelengths in the SOC prediction model. The reflectance spectra of a total of 424 topsoils (0-25 cm) samples were measured in a laboratory using a portable Terra Spec 4 Hi-Res Mineral Spectrometer with a wavelength range 350-2500 nm. Partial least squares regression (PLSR) with leave-one-out cross validation was used to develop calibration models for SOC prediction. The accuracy of the estimate determined by the coefficient of determination (R2), the concordance correlation coefficient (ρc), the ratio of performance to deviation (RPD), the range error ratio (RER) and the root mean square error (RMSE) values of 0.83, 0.90, 2.22, 14.2 and 2.47 g C kg-1 respectively, indicated good model for SOC prediction. The near infrared (NIR) and the short-wave infrared (SWIR) spectrums were more accurate than those in the visible (VIS) and short-wave near-infrared (SWNIR) spectral regions. The wavelengths contributing most to the prediction of SOC were at: 1925, 1915, 2170, 2315, 1875, 2260, 1910, 2380, 435, 1960, 2200, 1050, 1420, 1425 and 500 nm. This study has shown that VIS-NIR reflectance spectroscopy can be used as a rapid method for determining organic carbon content in the Red Mediterranean soils that can be sufficient for a rough screening.


Introduction
Soil organic carbon content (SOC) is a fundamental constituent of the soil. It governs physical, chemical and biological processes in soils. The decline of SOC content in soils is one of the main threats for soil degradation. There is an increasing interest for large amounts of the accurate SOC data to be used in protection and enhancement of the global environment, soil quality assessment and precision agriculture. In last few decades diffuse reflectance spectroscopy (DRS) due to its cost-efficiency, ease of handling, rapidity, minimal sample preparation and the development of chemometrics is being considered as a possible alternative to the conventional soil laboratory analyses. The visible-near-infrared (VIS-NIR) spectroscopy provides a large number of information on the organic and inorganic soil components. Absorption in the visible range provides a measure of soil colour, organic matter (Ben-Dor et al., 1999) and Fe minerals, mainly haematite and goethite (Sherman and Waite, 1985). The near-infrared (NIR) portions of the electromagnetic spectrum are associated with the stretching and bending of NH, OH and CH groups (Dalal and Henry, 1986;Clark, 1999;Viscarra Rossel and Behrens, 2010).
Published data pointed out that different spectral ranges and wavelengths can be responsible for the prediction of the SOC content (Brown et al., 2006;Viscarra Rossel et al., 2006a;Viscarra Rossel and Behrens, 2010). Sudduth and Hummel (1991) reported that near-infrared (NIR) reflectance data provided considerably better predictions of soil organic matter content than visible (VIS). Islam et al. (2003) evaluated the ability of reflectance spectroscopy in the UV, VIS, and NIR ranges to predict several soil properties. They demonstrated that the overall prediction of SOC was better with the whole spectral range (VIS-NIR) than NIR or VIS. Several studies have examined the question of the spectral ranges most suitable for predicting SOC concentration, considering the VIS, VIS-NIR, NIR and mid-infrared (MIR) (Viscarra Rossel et al., 2006b). They indicated that the MIR was more suitable than the NIR or VIS, but also noted that the MIR spectroscopy is more expensive and complex than VIS-NIR measurement. The contribution of each wavelength of the VIS-NIR reflectance spectra for the SOC prediction was the subject of numerous studies (Ben-Dor and Banin, 1995;Dalal and Henry, 1986;Chang and Laird, 2002;Lee et al., 2009;Stenberg et al., 2010;Sarkhot et al., 2011;Xu et al. 2016). They have identified different wavelengths as important for estimation of SOC. Numerous studies used VIS-NIR spectroscopy to analyse SOC content on the global (Brown et al., 2006;Viscarra Rossel et al., 2016), continental (Stevens et al., 2013), national (Vasquez et al., 2010Knadel et al., 2012;Shi et al., 2015;Wijewardane et al., 2016), regional (Ben-Dor andBanin, 1995;Islam et al. 2003;Lee et al., 2009;Summers et al., 2011;Sarkhot et al., 2011;Leone et al., 2012;Shi et al., 2015) and local/field scales Kuang and Mouazen, 2012;Wetterlind et al., 2010;Fontán et al., 2011). The most studies were conducted on the heterogeneous sets of soil samples with respect its geographical origin and varying of soil forming factors (Climate, Organisms, Topography, Parent material and Time). The exposed literature sources showed that the estimates of the SOC have been validated with a substantial range of the accuracy. The possible factors of the relatively large differences in the accuracy of the SOC estimation related with a number of factors (high SOC variability, the heterogeneity of soil types and parent material, land uses and size of the sampling area). Numerous studies have shown that geographically closer and more homogeneous sample sets with similar characteristics resulted in the improved SOC prediction models (Vasques et al., 2010, Wijewardane et al., 2016. In this context, we decided to explore the accuracy of VIS-NIR spectroscopy for prediction of SOC content using soil samples that are similar with respect to soil types, parent materials and land use. We have chosen Red Mediterranean soils from Dalmatia, Croatia on limestones and dolomites used in agricultural production. The objectives of this work were (i) to assess the accuracy of the diffuse reflectance spectroscopy and PLSR for the SOC measurement and (ii) to determine the importance of different VIS-NIR spectral ranges and the wavelengths for the estimating SOC content.

Study area and soil data
In our analyses, we used the SOC content data and soil spectra for a total of 424 topsoil samples selected from a Spectral library of soils from Dalmatia (Miloš, 2013). Dalmatia occupies the middle part of the Adriatic coastal zone of Croatia. The SOC content was determined using the Kotzman method (JDPZ, 1966). Selected data belong to soils with a characteristic reddish-brown to red colour what is commonly referred to as a Red Mediterranean soil or Terra Rossa on hard limestone and dolomites. According to the Croatian soil classification system, these soils belong to the class of Cambic soils. The World Reference Base for soil resources (FAO, 2014) equivalents is Chromic Cambisols and Rhodic Cambisols. The climate of the study area is Mediterranean, with dry, hot summers and mild, moderately rainy winters. Agricultural areas of this typical karst environment consists a large number of dislocated smaller areas and parcels of olive groves, vineyards, fruit orchards and abandoned agricultural land. Spectra measurements and data pre-processing The reflectance spectra of soil samples were measured in a laboratory using a portable Terra Spec 4 Hi-Res Mineral Spectrometer with a wavelength range 350-2500 nm. Soil samples were air-dried and sieved through 2 mm sieve. The correction with a standardised white Spectralon® panel (Analytical Spectral Devices, Boulder, CO, USA) with 100% reflectance was made prior to the first scan and after every ten samples. All spectra were recorded at 1 nm data spacing interval over the wavelength range of 400-2500 nm. Pre-processing method: first derivative with Savitzky-Golay smoothing (Savitzky and Golay, 1964), with a second order polynomial fit. The spectral range of the soil spectra was first reduced to 400 -2490 nm to eliminate the noise at the edges of each spectrum. Spectral shortening was followed by averaging of the adjacent 5 nm wavelength. Thus, the total number of wavelengths for VIS-NIR modelling were 419 wavelengths.

Model Calibration and Validation
We used the partial least squares regressions (PLSR) with leave-one-out cross-validation method for calibrating the spectral data with the reference data and SOC model construction. For more details on the PLSR see e.g. Martens and Naes (1989) and Wold et al. (2001). Leave-one-out cross-validation method (Efron and Tibshirani, 1994) we used to determine the optimal number of the factors to retain in the calibration model. The regression coefficients for each of the 419 wavelengths in the range of 400 to 2490 nm were calculated. The best SOC model was generated when the PLS analyses were run using only significant wavelengths (p<0.05) identified with Marten's uncertainty test.

Model Performance Evaluation
Predictive performances of the SOC model were evaluated by calculating the root mean square error (RMSEP) of prediction, the ratio of performance to deviation (RPD), range error ratio (RER) the coefficient of determination (R 2 ) and the concordance correlation coefficient (ρc). RMSEP is defined as the square root of the average of squared differences between predicted and measured Y values of the validation objects and evaluated by Equation (1): Eq. (1) where and y î are the measured and predicted values of sample i, respectively, and N is the number of samples.
The RPD was initially used by Williams (1987) for assessing the goodness of fit for NIR calibrations. The RPD represents the division between the reference data standard deviation and the standard error of the prediction and deduced with Equation (2): where SD is the standard deviation of the validation dataset and SEP (Equation 3), which is standard error of prediction. SEP is the RMSEP corrected for bias (Equation 3). Bias is the average value of the difference between predicted and measured values (Equation 4).
Eq. (4) To interpret the RPD values we adopted the six level interpretations given by Viscarra Rossel et al.  5) is the ratio of the range of the reference data to the SE in the validation set (Starr et al., 1981).

RER = Max − Min SEP
Eq. (5) In Equation 5, Max and Min are the maximum and the minimum values observed in the reference data. The RPD and RER are dimensionless statistics, meaning they can be compared on the same basis between different models. The RER values we interpreted according to the thresholds given by Malley et al. (2004) as follows: RER>20 indicate excellent prediction model; 15≤ RER ≤20 successful; 10≤ RER 15 moderately successful and 8≤ RER <10 indicate moderately useful prediction model. To interpret the predictive performances of the SOC models calculated by the coefficient of determination (R 2 ) we adopted threshold values given by Saeys et al. (2005) as follows: a value for R 2 between 0.66 and 0.80 indicates approximate quantitative predictions, R 2 between 0.81 and 0.90 reveals good prediction and R 2 >0.90 is considered to be an excellent prediction. The concordance correlation coefficient (ρc) was proposed by Lin (1989) for assessment of concordance (agreement) between two measures of the continuous data. Like a correlation, Lin's concordance correlation coefficient (ρc) ranges from -1 to 1, where a value of 1 denotes perfect agreement; values >0.90 suggest excellent agreement; values between 0.80 and 0.90 substantial agreement; between 0.65 and 0.80 moderate agreement and values <0.65 poor agreement.

Results and Discussion
The soil organic carbon statistics and soil spectral properties The descriptive statistics of the soil organic carbon content analysed using conventional laboratory method analysis (reference dataset), and their calibration and cross-validated PLSR predictions for the four spectral regions: VIS (400-700 nm), NIR (700-2490 nm), SWNIR (700-1100 nm), SWIR (1100-2490 nm) and combined VIS-NIR (400-2490 nm) is given in Table 1. The SOC content for the reference dataset varies from 0.81 to 37.87 g C kg −1 with an average value of 21.35 g C kg −1 (Table 1). VIS-NIR -visible and near infrared range (400-2490 nm); NIR -near infrared range (700-2490 nm); VIS -visible range (400-700 nm); SWIR -shortwave infrared range (1100-2490 nm); SWNIR -shortwave near infrared range (700-1100 nm) The negatively skewed distribution for the reference, calibration and VIS-NIR, NIR and SWIR validation dataset indicate a slightly asymmetrical distribution with a long tail to the left. The skewness for the validation VIS and SWNIR spectral regions are less than -1.0 and indicate substantial skewness and the distribution is far from symmetrical. Figure 1 and 2 show mean reflectance spectra and mean first-derivative reflectance spectra of the soil samples that were used to develop PLSR calibration model. The shape of the overall reflectance spectra ( Figure 1) shows a typical soil spectrum characterised with reflectance increasing with increasing wavelength in the visible range (400-700 nm) and does not contain distinct or sharp peaks that can be directly associated with specific constituents. In the visible range, the mean first-derivative of the SOC reflectance spectra (Figure 2) shows two obvious adsorption bands near 430 and 565 nm which can be attributed to the presence of the chromophorous constituents, mainly iron oxide (Ben-Dor et al., 1999). The mean reflectance spectra (Figure 1), as well as mean first-derivative spectra (Figure 2), show a weak concave shape at the wavelengths around 800-970 nm. This concavity can be attributed mainly to increased iron oxide content since the soil samples contain a low content organic matter which can mask the effect of Fe (Demattê et al., 2004). Furthermore, soil mean reflectance and first-derivative spectra (Figure 1 and Figure 2) show several important absorptions around 1400, 1900, and 2200 nm. In addition to this, first-derivative spectra shows strong adsorption around 2400 nm (Figure 2). These absorptions can be attributed to water molecule vibrations and OHgroups (1400 nm), water (1900 nm) and mineral influences (2200 nm and 2400 nm).

Performance assessment of calibration and validation models
Statistical description of the PLSR calibrated and their cross-validated PLSR SOC predictions for different spectral ranges is given in Table 2. The best SOC prediction model for the each spectral range was generated when the PLSR analyses included five factors. The PLSR SOC model with the lowest RMSEP and highest R 2 , ρc, RPD and RER is considered as the best model. The best prediction for SOC was obtained from the VIS-NIR spectra with validation R 2 , ρc, RPD, RER and RMSEP values of 0.83, 0.90, 2.22, 14.2 and 2.47 g C kg -1 respectively. -coefficient of determination; RMSEP -root mean square error of prediction; SEP -standard error of prediction, ρc -concordance correlation coefficient; RPD -ratio of performance to deviation; RER -range error ratio According to threshold values for the coefficient of determination (R 2 ) given by Saeys et al. (2005) R 2 values of the SOC validation models for the spectral ranges (Table 2) can be interpreted as follows: the SOC model developed using the VIS-NIR spectral range reveals good prediction; NIR and SWIR region indicate approximate quantitative predictions until the calibrations for VIS and SWNIR region are only able to discriminate between high and low SOC values. Lin's ρc values of the SOC validation models for the VIS-NIR range (Table 2) suggests good agreement; the NIR and SWIR region denote substantial agreement and VIS and SWNIR region suggest a poor agreement between reference data and the cross-validated SOC values. According to the stated interpretation given by Viscarra Rossel et al. (2006b), follows that the VIS-NIR spectral range with the RPD value of 2.22 (Table 2) (Table 2) we interpreted according to the thresholds given by Malley et al. (2004) as follows: the SOC model created using VIS-NIR and NIR range (10≤RER≤15) indicate moderately successful prediction; SWIR spectral range with 8≤RER<10 indicates moderately useful prediction model and model created using the VIS and SWNIR region RER<8 indicate poor prediction. The RMSEP is the most efficient measure of the average uncertainty in spectral predictions that can be expected for future predicting new samples. The future predictions of the samples in the test set can be considered that 2 times RMSEP represents a 95% confidence interval for the real values. Since RMSEP values for the SOC predicted models (Table 2) ranged from 2.47 g C kg -1 (VIS-NIR) to 4.53 g C kg -1 (VIS) with its mean value of 21.36 g C kg -1 (Table 1), then there is a 95% chance that the mean value of the SOC content, as measured in laboratory, lies between 16.42 to 26.30 g C kg -1 and 12.30 to 30.42 g C kg -1 respectively. The predictive performance of the SOC prediction model in this study (Table 2) is a similar accuracy as those obtained by Gras et al. (2014), Wijewardane et al. (2016) and Knadel et al. (2012) who achieved similar R 2 values of 0.82, 0.83 and 0.81 and RPD of 2.40, 2.41 and 2.40, respectively. Some authors (Brown et al., 2006;Chang andLaird, 2002 andViscarra Rossel et al., 2016) reported the higher accuracy of SOC prediction models than our with R 2 values of 0.87, 0.89 and 089, respectively. Leone et al. (2012) obtained a high accuracy (R 2 0.84-0.93 and 2.36-2.53 RPD) for local SOC predictive models in the soils of southern Italy. Study of Kuang and Mouazen (2012) at the farm-scale in four different European countries showed a very wide range of R 2 and RPD, 0.12-0.96 and 1.07-4.95, respectively. Wetterlind et al. (2010) reported a very good prediction of SOM (R 2 = 0.89, RMSE = 4.70 g kg -1 and RPD= 3.0) achieved for a farm-scale calibration model using only 25 soil samples. However, numerous studies achieved a less predictive capacity of SOC models than our e.g. Gao et al. (2014), Shi et al. (2015), Summers et al. (2011) andFontán et al. (2011). In the mentioned studies R 2 ranged from 0.55 to 0.79 and RPD from 1.80 to 2.01. The exposed large differences in the accuracy of the SOC estimates related with a number of factors (high SOC variability, the heterogeneity of soil types and parent material, land uses and size of the sampling area) and it cannot be easily identified. In general, all parameters of cross-validated predictions diagnostic (R 2 , Lin's ρc, RPD, RER and RMSP) given in Table 2, showed that the SOC prediction model created using combined VIS-NIR spectra provides the most accurate predictions. So, the NIR and the short-wave infrared (SWIR) spectrum predictions were more accurate than those in the VIS and short-wave near-infrared (SWNIR) spectral region. Exposed is in accordance with previous findings, e.g. Sudduth and Hummel (1991) who reported that NIR reflectance data provided considerably better predictions of soil organic matter content than those in the visible range. Islam et al. (2003) demonstrated that the overall prediction of SOC was better with the whole VIS-NIR spectral range than in the NIR.

Identification of the important wavelengths for the SOC prediction
The contribution of each wavelength of the VIS-NIR spectral range for the SOC prediction model, marked with PLSR regression coefficient, is shown in Figure 3. The best SOC prediction model included only those wavelengths for which Marten's test of uncertainty showed that significantly contribute (p < 0.01) to the prediction. Figure 3 illustrates that the final SOC model retained a very large number of wavelengths (total number of selected wavelengths were 176) and that the significant wavelengths with large regression coefficients were observed throughout the spectrum. In the visible range the significant wavelengths with the maximum regression coefficients of the SOC prediction model were observed as individual peaks at 430 and 435 nm and a broad spectral range from 500 to 590 nm, with maximum peaks at 500 and 555 nm (Figure 3). In the SWNIR (700-1100 nm) spectral region there is noticeable a broad region around 800 nm with a maximum contribution at 815, 830 and 840 nm. Wavelengths in these spectral regions associated with the chromophorous constituents -iron oxidesmainly haematite and goethite (Sherman and Waite, 1985) and organic matter (Ben-Dor et al., 1999). The highest correlation coefficients were observed in the short-wave infrared (SWIR, 1100-2500 nm) region ( Figure 3). This spectral region is characterised with featured absorption bands around 1400 (max. peak at 1420 nm), 1900 (max. peak at 1925 nm), 2200 and 2300 nm due to OHand water (1400 nm and 1900 nm) and mineral influences (2200 and 2300 nm). These absorption bands associated also with the overtones and combination absorptions of C-H, O-H, S-H, C=O, and N-H (Dalal and Henry, 1986;Clark, 1999;Viscarra Rossel and Behrens, 2010) and consequently contain information about the concentration of the organic composition of the scanned soil sample. The high contributing wavelengths of the water adsorption features are consistent with research findings by Viscarra Rossel et al. (2006a) and Sarkhot et al. (2011). The wavelengths contributing most to the SOC prediction model are found around 2200 nm (max. peak at 2170 nm) and around 2300 nm (max. peaks at 2315 and 2380) due to Al-OH bend plus O-H stretch combinations, that are diagnostic absorption features in clay mineral identification (Clark et al., 1990). Many other authors (Ben-Dor and Banin, 1995;Dalal and Henry, 1986;Stenberg et al. (2010) and Viscarra Rossel and Behrens (2010) have also identified the bands around 2200 and 2300 nm as being important for SOC calibration.
Ordering 15 wavelengths with the highest regression coefficients in the final a fifth-component SOC model for VIS-NIR range were: 1925, 1915, 2170, 2315, 1875, 2260, 1910, 2380, 435, 1960, 2200, 1050, 1420, 1425 and 500 nm. This is consistent with other research findings. For example, Lee et al. (2009Lee et al. ( ) reported 450-465, 965, 1409, and 1775-2200 nm as important wavelengths for soil organic carbon prediction. Sarkhot et al. (2011) observed the significant wavelengths throughout the spectrum, while the magnitude of PLSR coefficients was higher in the region 400, 1400, 1900 and 2200 or 2300 nm. Chang and Laird (2002) reported that the wavelengths important for organic carbon were in the range between 1770 nm to 2500 nm. This study showed that twelve of the fifteen most significant wavelengths were greater than 1100 nm. The advantage of the wavelengths above 1100 nm is that they are uncorrelated with Fe-oxides that reduced the collinearity problem with organic matter. This is consistent with previous research (Ben-Dor and Banin, 1995;Dalal and Henry, 1986;Stenberg et al., 2010;Xu et al., 2016) which have identified wavelengths around 1100, 1600, 1700 to 1800, 2000, and 2200 to 2400 nm as being particularly important for SOC calibration.
As conclusions, (i) all parameters of cross-validated predictions diagnostic R 2 , Lin's ρc, RPD, RER and RMSEP with values of 0.83, 0.90, 2.22, 14.2 and 2.47 g C kg -1 respectively indicated good model for predicting of the organic carbon in Red Mediterranean soil, (ii) the SOC prediction model created using combined VIS-NIR spectra provided the most accurate SOC predictions, (iii) the NIR and the short-wave infrared (SWIR) spectrum predictions were more accurate than those in the VIS and short-wave near-infrared (SWNIR) spectral region and (iv) the wavelengths contributing most to the prediction of SOC content were at 1925,1915,2170,2315,1875,2260,1910,2380,435,1960,2200,1050,1420,1425 and 500 nm. Taking into account the predictive statistics and accuracy of created model, it can be sufficient for rapid and a rough screening of the organic carbon in Red Mediterranean soils.