Non-destructive Estimation of Chlorophyll a Content in Red Delicious Apple Cultivar Based on Spectral and Color Data

Non-destructive estimation of the chemical properties of fruit is an important goal of researchers in the food industry, since online operations, such as fruit packaging based on the amount of different chemical properties and determining different stages of handling, are done based on these estimations. In this study, chlorophyll a content in Red Delicious apple cultivar is predicted as a chemical property that is altered by apple ripening stage, using nondestructive spectral and color methods combined. Two artificial intelligence methods based on hybrid Multilayer Perceptron Neural Network - Artificial Bee Colony Algorithm (ANN-ABC) and Partial least squares regression (PLSR) were used in order to obtain a nondestructive estimation of chlorophyll a content. In application of the PLSR method, various pre-processing algorithms were used. In order to statistically properly validate the hybrid ANN-ABC predictive method, 20 runs were performed. Results showed that the best regression coefficient of the PLSR method in predicting chlorophyll a content using spectral data alone was 0.918. At the same time, the average determination coefficient over 20 repetitions in hybrid ANN-ABC in the estimation of chlorophyll a content, using spectral data and color features were higher than 0.92±0.040 and 0.89±0.045, respectively, which to our knowledge is a remarkable non-intrusive estimation result.


Introduction
The production and consumption of fruit and vegetables are rising due to their flavor and healthy properties (Li et al 2016; There are several methods for grading fruit including grading by the shape, volume, weight, and physicochemical properties (Sabzi et al 2015). The method of grading fruit and vegetables based on physicochemical properties provides a wide choice for consumers (Zhang et al 2018). Estimating these characteristics has already been done using destructive methods, but it is almost impossible to package fruit according to the amount of any physicochemical property, due to the destructive nature of these estimation methods. Nondestructive methods, such as light spectroscopy, have the ability to package fruit and vegetables according to the amount of any physicochemical property (Arendse et al 2018). Other possibility of these non-destructive methods is the recognition of different fruit DOI: 10.15832/ankutbd. 523574 varieties, including apple (Eisenstecken et al 2019). Mohammadi et al (2015) used an ima ging method to estimate the ripening time of persimmon fruit. In fact, they divide the whole ripening process of this fruit into three stages: unripe, ripe and overripe fruit. Their algorithm performs classification based on the color features of the persimmon skin in two color spaces, RGB and L*a*b*. In order to verify the reliability of the proposed algorithm, they used a number of physical, mechanical, and chemical properties. After several studies, it was found that color features such as R, G, and b* had a significant difference in the various stages of ripening. Finally, two classifications based on Linear Discriminant Analysis and Quadratic Discriminant Analysis (QDA) were used to categorize the ripening stages of persimmon fruit. Results showed that by using QDA, the ripening stages of persimmons were estimated with a precision of 90.24%.
Banana is among most consumed fruit along the globe. Among all food products, banana ranks fourth (Seymour et al 1993). Ripening stage is one of key factors in post-harvest operations and in general fruit processing. This factor determines consumer acceptability and eating quality (Thompson & Burden 1995). Optical methods are a kind of nondestructive methods that have attracted several researcher's interest in recent years (Ali et al 2017;Adebayo et al 2016a). For example, Adebayo et al (2016b) proposed a method for the non-destructive estimation of qualitative characteristics and classification of the banana ripening stages using optical properties. Five wavelengths of 532, 660, 785, 830 and 1060 nm were used in order to predict fruit quality features. Results showed a high correlation between optical properties and banana ripening stages at 532, 660, and 785 nm wavelengths, so that, the different ripening stages of banana fruit were predictable with an accuracy of 53.97%.
Apple is also a very important garden fruit with high demand around the world. For instance, Cardenas-Perez et al (2017) evaluated the estimated physicochemical properties of Malus domestica cv. Golden Delicious apples by a nondestructive method using a computer vision system. 114 apple samples were used to develop the system. Various color spaces were investigated to extract color features, and finally, it was determined that the most suitable color space for measuring the color of the fruit was CIELab. CIELab coordinates include L* (Luminosity), a* (green to red), and b* (blue to yellow). The physicochemical properties extracted from each sample were: titratable acidity, total soluble solids, firmness, and ripening index. Results showed that by combining color features and chemical properties extracted at different stages of apple ripening, a non-destructive evaluation of the fruit would be possible (Cardenas-Perez et al 2017).
Visible/near-infrared hyperspectral imaging (HSI) technique is one of the most effective non-destructive estimation methods for physicochemical properties, followed by an estimation of the fruit ripening stage. In recent years, HSI method has been widely used to assess food quality and safety (Jackman et al 2009;Wu & Sun 2013). Hu et al (2017) investigated the effect of 1-methylcyclopropene (1-MCP), which causes delayed aging of fruit and vegetables, as one of the most effective ethylene reaction inhibitors in preserving the quality of fruit and aggregating the rate of glucose, fructose, and sucrose in different ripening stages of kiwi fruit, in a non-destructive method. In order to test the system, 210 fruit were selected in the range of 0.09-0.13 kg with no visible defects. Vis/NIR laboratory HSI system (Imspector V10E, Spectral Imaging Ltd., Oulu, Finland) was set in reflection mode. The wavelength range of the spectrometer was set at 308-1105 nm, with a resolution of 1002×1004 pixels (spatial spectrum). During the preparation of hyperspectral images, each kiwi fruit was placed on site, and moved by a stepper motor at a speed of 1.3×10 -3 m s -1 and an exposure time of 30×10 -2 s, under the objective of a CCD camera. Results showed that glucose, fructose, and sucrose values can be properly modeled using spectral data alone.
In recent years, non-destructive methods have been of interest to researchers around the world due to their practical applications in the food industry (predicting the number of physicochemical properties, detection of fruit and vegetables ripening stages, etc.). Therefore, in this study, two non-destructive spectral and color methods combined with Hybrid (ANN-ABC) were used in garden to estimate the amount of chlorophyll a as an index of ripening during different growing stages of Red Delicious apple cultivar, due to its skin visible-range color variation in different stages of ripening.

Sampling
Spectral, color, and chemical data were collected from 56 Red Delicious apple fruit samples that harvested in four different stages of ripening (14 samples of each stage). The first, second, third and fourth stages of harvesting were 135, 145, 155 and 165 days after full blooming. Apples were obtained from Kermanshah, Iran (longitude: 7.03 °E; latitude: 4.22 °N) gardens. Then, they were transferred to Shahid Beheshti University to measure the spectral data, and in order to extract color features and chemical properties of chlorophyll a, they were transferred to the Agricultural Engineering Research Institute, Iran.

Spectroscopy system configuration
Radiation of electromagnetic waves into material causes the absorption and reflection of radiation by material. This interaction of radiation and material leads to the acquisition and presentation of information in spectroscopic studies. In fruit and vegetables, the absorption process of these beams is carried out by the C-H, O-H, and N-H chemical bonds (Nicolaï et al 2007). To do the spectrometry analysis in a proper way, you need to configure the spectroscopy system, for obvious reasons. An Intel Corei3-CFI PC, 330Mb at 2.13GHz, 4GB of RAM and Windows 10 was used, which was also equipped with SpectraWiz software (StellarNet Inc. Tampa, Florida) to store the resulting spectrum in the computer. The measurement mode in this research was reflective mode. The EPP200NIR (StellarNet, Tampa, Florida) model spectrometer with an InGaAs (Indium Gallium Arsenide) detector with operation range of 200 nm to 1100 nm and a resolution of 1 to 3 nm, was used in this study, which was connected to a computer via a USB2 cable. Also, the SLI-CAL (StellarNet, Tampa Florida) model light source made of tungsten halogen with a power of 20 watts was used in this study. A two branch optical fiber was used to guide light from the light source to apple and from apple to the spectrometer. Because of intense noise, the first 200 nm and the last 100 nm wavelengths were eliminated. So, the studied spectral range was 400 nm to 1000 nm. Given the noise in the spectroscopy, 16 samples out of 56 samples were eliminated from the study resulting a total of 40 samples that were used for analysis.

Extraction of color features
In order to perform a colorimetric study, parameters L*, a* and b* of the skin of apples were measured by a CR-400 (Konika Minolta, Japan) colorimeter and then, the color purity indices (C*) and hue angle (ha) were calculated using Equations (1) and (2)  ( 1 )

Measuring true Chlorophyll a content
Due to changes in the color of the fruit during the ripening stages as well as the change in the amount of chlorophyll a during this period, we believe that the non-destructive estimation of chlorophyll a is very useful, since it is possible to estimate the stage of fruit ripening, specifically Red Delicious apple cultivar (Costa et al 2009;Amoriello et al 2018). Destructive measurement of chlorophyll a content was done by the method used in Ncama et al (2017).
Based on this method: 1. 1 gr of rind powder was extracted using 8 mL of 80% acetone. 2. This material was transferred to a glass tube that covered with ice. 3. This glass tube stand for 10 min in this condition. 4. Previous materials were homogenized for 1 min. 5. Centrifuge operation for 10 min at 4 °C. 6. The calculation of pigments based on wavelengths required (Equation (3) Where; A-absorbance of a sample at a subscript wavelength, e.g. A 663.2 is a sample absorbance at 663.2 nm.

Non-destructive estimation of chlorophyll a content
For the non-destructive estimation of chlorophyll a content, two spectral and color methods were used. For spectral data, Partial Least Squares Regression (PLSR) method using ParLeS Software (ParLeS_v3.1) (Rossel 2008), and hybrid ANN-ABC method were used. In addition, hybrid ANN-ABC method was also used for color data.

ParLeS software
ParLes is a chemometrics software used for multivariate modeling and prediction. In fact, this software is used for chemical training, research and spectroscopy. This software has the ability of transmitting and pre-processing the received spectrum from different samples by various algorithms (Rossel 2008).

Hybrid artificial neural network-artificial bee colony algorithm (ANN-ABC)
Multi-layer perceptron (MLP) neural network is an effective method for nonlinear modeling of various characteristics. This network has adjustable parameters whose performance depends on the optimal setting of these parameters. Parameters include number of neurons and layers, transfer function, back-propagation training function and backpropagation weight/bias function. In this study, an artificial bee colony algorithm was used to set these parameters. The bee algorithm is an optimization algorithm based on the honey bee swarming behavior suggested by Pham et al (2006). This algorithm is based on the behavior of honeybee in search of food sources. The different stages of the bee algorithm are as follows: 1. Generating initial responses and evaluating them. 2. Selecting better sites (responses) and sending worker bees to those sites. 3. Returning the bees to the hive with artificial dance (producing a neighbor response). 4. Comparing all the bees of a site and select the best case. 5. Replacing non-selected bees with random answers. 6. Saving the position of the best answer -returning to Step 2 if the termination conditions are not met (Pham et al 2006).
In this study, the number of neurons in each layer may be selected in the range of 0-25. The number of layers could be minimum 1 and maximum 3. There were 15 different types of transfer functions, such as tansig, logsig, hardlim, satlins and tribas to be selected. 19 different back-propagation network training functions such as trainb, traingda, traincgb, traincgb and trainlm were able to be sub-optimally selected by ABC algorithm. Finally, the back-propagation weight/bias learning function was selected from among 15 different functions such as learnpn, learnk, learncon and learnwh. The method is summarized as follows: first, the ABC algorithm considers a similar vector with the mentioned number of parameters, namely a vector with a minimum of 4 and a maximum of 8 members. For example, vector x= [13, 17, logsig, tribas, trainb, learnpn] implies a two hidden layers network with 13 and 17 neurons each, logsig transfer function in first layer, tribas transfer function in second layer, trainb back-propagation network training algorithm, and learnpn back-propagation weight/bias function. Mean square error (MSE) is a parameter that determines the efficiency of the MLP neural network, which is related to each ABC selected sub-optimal vector sent to it. Such way, each vector with the lowest values of MSE is selected as optimal in order to set the neural network parameters.

Evaluation parameters of chlorophyll a content predictive model
To evaluate the efficiency of chlorophyll a content estimation models by Hybrid ANN-ABC, the coefficient of determination (R 2 ), Sum Squared Error (SSE), Mean Absolute Error (MAE), MSE, and Root Mean Square Error (RMSE) were used (Sabzi et al 2013;Sabzi & Arribas 2018). In order to evaluate the predicted models by PLSR, regression coefficient (Rp), adjusted regression coefficient (Rp.adj), RMSE, Standard Deviation of the Error Distribution (SDE), and Relative Percent Deviation (RPD) were used, (Rossel 2008). Figure 1 shows flowchart of the type of data and the used methods to estimation chlorophyll a.

Reflectance and absorption spectra
Nicolaï et al (2007) conducted a study and showed that the reflectance spectra of different fruit such as apples, oranges, nectarines, and pears, were similar. This means that the reasons behind the instance of peaks in the spectral diagrams of fruit are similar. Figure 2 shows the mean reflection and absorption spectra of the visible/near-infrared light. In order to reduce the nonlinearity that may exist in the spectrum, ParLeS software offers the possibility of converting the reflection spectra (R) into the absorption spectra by log (1/R) (Rossel 2008). As you can see, there are peaks near the wavelengths of 490 and 680 nm. A peak near the 490 nm wavelength is related to the absorption of carotenoids and a peak in 680 nm, to the absorption of chlorophyll a (Cayuela 2008;Martínez-Valdivieso et al 2014).

Pre-processing of spectra
For some reason, such as the effect of light diffusion by changing the detector spacing with the sample, surface roughness in the sample, variation in sample size, noise caused by the increase of the spectrometer temperature, etc., the spectral data, in addition to the sample information, also contains other unwanted information. If one uses ordinary partial least squares regression, this unwanted information reduces the accuracy of the calibration model. On the other hand, ParLeS software, by accessing more stable and reliable models, uses different methods of spectral preprocessing such as Savitzky-Golay (SG), Standard Normal Variate (SNV), Wavelet Filter (WF), SNV with Wavelet Detrending (SNVWWD), Wavelet Detrending (WD), and Median Filter (MF), which are based on various mathematical operations (Rossel 2008).

Spectral data
Color data

Partial least squares regression (PLSR) Hybrid ANN-ABC Hybrid ANN-ABC
Methods to estimation chlorophyll a

Partial least squares regression (PLSR) efficiency in predicting the chlorophyll a fruit content
Since the purpose of this study is to estimate the amount of chlorophyll a and according to the preceding sections, the light absorption peak near the wavelength of 680 nm is directly related to chlorophyll a content, a wavelength window range of 676.282 nm to 686.293 nm was used for estimating Chlorophyll a content using PLSR. Table 1 shows the results of the spectral prediction of chlorophyll a content using PLSR. In general terms, when Rp, Rp.adj, and RPD values are large, and at the same time, RMSE Percentage (RMSEP) and SDE values tend to be small, the resulting estimation model will be more efficient and can be used to properly predict the amount of chlorophyll a. Based on the explanations given, preprocessing models SNVWW+SG (model 4) and WD+MF (model 5) are better than others and predict chlorophyll a content in fruit more accurately, and thus with less error.

Non-destructive estimation of chlorophyll a content on spectral data using Hybrid ANN-ABC
To predict chlorophyll a content by Hybrid ANN-ABC, a spectral window of 662.01 to 698.04 nm, which includes the wavelength of 680 nm, was used. As mentioned above, optimal adjustment of the MLP neural network parameters ensures its high performance in predicting the amount of chlorophyll a content. The MLP neural network optimal parameters as set by the artificial bee colony algorithm are two-layer structure with number of neurons 5 and 12. Transfer function for first and second layers are tirbas and purelin respectively. Backpropagation network training function is traingda and final backpropagation weight/bias learning function is learnos. After determining the optimal structure of the artificial neural network, in order to validate the predictive method, 20 replications were performed. The mean±std of R, SSE, MAE, MSE and RMSE were 0.962±0.040, 0.441±0.354, 0.129±0.061, 0.034±0.027 and 0.173±0.064 respectively. Also among these 20 replications, the best values of R, SSE, MAE, MSE and RMSE were 0.987, 0.111, 0.062, 0.008 and 0.092 respectively. As it can be seen, the mean determination coefficient of the 20 repetitions is higher than 0.92, and the standard deviation is 0.04. Therefore, it can be concluded that the performance of Hybrid ANN-ABC is much better than partial least squares regression. Figure 3(a) shows the regression analysis scater plot on the estimated values of chlorophyll a content and its true measured values, for the test data. Estimated chlorophyll a content is related to the average chlorophyll a content of the samples in 20 replicates. As can be seen, the regression coefficient is 0.975. This fact implies that the Hybrid ANN-ABC method shows stability on spectral data.

Non-destructive estimation of chlorophyll a content on visible-range color features using Hybrid ANN-ABC
After using different color features as inputs to hybrid ANN-ABC, the results showed that the two properties of the second component of L*a*b* color space (a*) and the hue angle-as input to the artificial neural network-had the higher ability to predict a chlorophyll a content as compared to other features. Therefore, these two color properties were used to predict the amount of chlorophyll a present in fruit samples. The MLP neural network optimal parameters as set by the artificial bee colony algorithm are two-layer structure with number of neurons 15 and 19. Transfer function for first and second layers are satlin and tirbas respectively. Backpropagation network training function is trainoss and final backpropagation weight/bias learning function is learnwh. In this case, as in the previous section, after determining the optimal amount of artificial neural network parameters, 20 replications were used to determine the stability of the methodology. The mean ± std of R, SSE, MAE, MSE and RMSE were 0.945±0.045, 0.556±0.269, 0.142±0.032, 0.2±0.051 and 0.043±0.021 respectively. Also among these 20 replications, the best values of R, SSE, MAE, MSE and RMSE were 0.982, 0.149, 0.072, 0.107 and 0.011 respectively. As can be seen, the performance of Hybrid ANN-ABC is remarkable in the estimation of chlorophyll a content based on color content, but it has a weaker performance than spectral analysis. Figure 3(b) shows the regression analysis of the dispersion plot between the estimated mean and the actual (true measured) value of chlorophyll a content in apple (test set) using color data. The regression coefficient of this method over the test data is 0.969, which is an acceptable value in predicting the amount of chlorophyll a.

Comparison of hybrid ANN-ABC method performance in non-destructive estimation of chlorophyll a content using spectral and color data
Figure 4 (a) shows error indices boxplots (MAE, MSE and RMSE) in the Hybrid ANN-ABC method in a nondestructive estimation of chlorophyll a content for 20 replicates based on both spectral and color data. As you can see, all the spectral data analysis error boxplot values are lower than the color data counterparts. On the other hand, in general terms the boxplot chart of the mean square error for spectral data is more compact than color data. Latter fact means that the performance of Hybrid ANN-ABC method in the non-destructive estimation of chlorophyll a content using spectral data is higher than that based on color data. In a similar fashion as done in Figure 4 (a), Figure 4 (b) also shows the high performance of the Hybrid ANN-ABC method in the non-invasive estimation of chlorophyll a content by spectral data. The boxplot charts of regression coefficient and coefficient of determination of the Hybrid ANN-ABC method in the non-destructive estimation of chlorophyll a content using spectral data is both tighter and higher than color data counterparts. Partial least squares regression method has poor performance for a state where no preprocessing is performed on the spectral graph. On the other hand, the Hybrid ANN-ABC method on the raw data has a far higher performance than partial least squares regression. This issue is important in real-time applications, since time is a very important factor in real-time applications, and when it is not necessary to preprocess the graph, the estimation of chlorophyll a content is possible to be computed in less time. Therefore, the amount of chlorophyll a is estimated at a higher speed. The next limiting factor in the non-destructive estimation of chlorophyll a, is the cost of building the device. The spectral method is far more expensive than the color method, given that in the color method by only using an algorithm and a typical visible-range camera, it is possible to estimate the non-destructive amount of chlorophyll a content. As seen, the performance of the Hybrid ANN-ABC method in the non-destructive estimation of chlorophyll a content using color data is close to spectral data. Therefore, considering the cost and performance of the Hybrid ANN-ABC method using color and spectral data, it might be better to use color data in some applications, despite always a trade-off between performance and cost exists. Table 2 shows a comparison of the performance of proposed methods in this study with other non-destructive methods for the estimation of chlorophyll a content in the literature. As one can see, the here proposed hybrid ANN-ABC method has better performance than other methods, despite direct comparison is not possible due to the fact that input fruit database is different in each case. It can be concluded that the proposed methodology has proven to have a high-performance and limited cost in the non-destructive estimation of chlorophyll a content in Red Delicious apple fruit.

Method
Fruit Regression coefficient Proposed method using spectrum data Apple 0.987 Proposed method using color data

Conclusions
In this study, two partial least squares regression and the Hybrid Artificial Neural Network-Artificial Bee Colony method were used to estimate the amount of chlorophyll a as an important parameter correlated with the stage of fruit ripening (Red Delicious apple cultivar in the present study) using both spectral and color data. The most important results are summarized next to conclude: 1. We believe that the performance of the hybrid Multilayer Perceptron Artificial Neural Network -Artificial Bee Colony Algorithm (ANN-ABC) is better than partial least squares regression because of its random nature in the training phase of the predicting model of chlorophyll a content.
2. Since the amount of chlorophyll a is predictable with rather high-performance by color features and using hybrid ANN-ABC approach, it constitutes an inexpensive method that can be used in on-line conditions that do not need very high accuracy in the estimation.
3. Using spectral data for the non-intrusive estimation of chlorophyll a content does not require spectroscopy over the entire visible/near-infrared range, and only a small window around 680 nm wavelength could be used. This will reduce the cost of the configuration and set-up of the spectroscopy system. 4. When using statistical methods like partial least squares regression for non-intrusive estimation of chlorophyll a content, the selection the suitable methods for preprocessing of spectral data is important, and ensures their high performance.