Low-Cost Classification of Close and Open Shell Antep Pistachio Nuts based on Image Analysis and Machine Learning

: The effectiveness of post-harvest industrial processes is critical to maintaining the economic worth of pistachio nuts, which play an essential role in the agricultural economy. To achieve this level of efficiency, updated applications and technology for pistachio separation and categorization are required. Different pistachio species target different markets, highlighting the need for pistachio species classification. This work aims to develop a classification model that is distinct from existing separation approaches, based on image processing and machine learning, and can provide the required categorization. A computer vision application was done to identify between three types of pistachios. A high-resolution camera was used to capture 385 images of these pistachios. The photos of the pistachio samples were processed using image processing techniques like segmentation and feature extraction. On the given dataset, an advanced classifier based on Decision Tree and Random Forest predictions was constructed, as well as a simple and successful classifier. In the research, an application with feature extraction based on the dimension and pixel measurement is proposed. The proposed approach attained a classification success rate of 100% at 70% train and 30% test, and also, 80% train and 20% test data rate with Random Forest prediction, according to the experimental data. The provided high-performance classification model fills an important demand for the separation of pistachio types while increasing the economic worth of the species.


Introduction
Pistachio has a high economic value in the world.The quality of the pistachio nuts produced has a significant impact on their popularity among consumers.When pistachios arrive at the processing plant, the following procedures are carried out (a) dehulling, which involves separating the soft hull from the pistachios; (b) trash and blank separation, which involves removing blank pistachios and trash such as small branches, remaining shells, and leaves; (c) unpeeled pistachios separation, which involves removing unpeeled and unripe nuts (Omid et al., 2009).The quality of pistachios is affected by factors that often come together as planting, human harvesting, transportation, storage, etc. (Brosnan and Sun, 2002).
Inspection and categorization of mixed pistachios into lots of uniform shapes and sizes is desirable to provide consumers with a more uniform product.Visual inspection is typically performed by human operators, and its output is influenced by a variety of factors, including operator age, concentration and motivation, fatigue and visual acuity, and room conditions (lighting, heating, ventilation, noise, and so on); for these reasons, automated systems are especially welcome (Omid et al., 2009).
The importance of optimizing post-harvest procedures cannot be overstated.Common practice involves categorizing pistachios as either open (with the shell split) or closed (with the shell intact).These subsets are processed independently later on.Pistachios are promoted as a snack food and are typically served as roasted nuts.Unsplit pistachios are unsuitable because they are difficult to open and may contain immature kernels, so they cannot be used for these purposes.Therefore, separating pistachios into open and closed shells is an essential part of the post-harvest procedure (Ghezelbash et al., 2013).
Pistachios have a lot of healthy nutrients.There are 560 calories in 100 g of it, and it's a good source of protein, fiber, minerals, and vitamins B, thiamine, and B6.Pistachios have many positive health effects, especially on the cardiovascular system (Kay et al., 2010;Ertürk et al., 2011;Dreher et al., 2012).
Pistachios are grown in 56 of Türkiye's province, making the country the world's third-largest producer.Pistachio production in Türkiye is increased by planting Kirmizi and Siirt species, which have larger fruits and less of a tendency toward periodicity (Ertürk et al., 2011).
Pistachios are typically categorized based on several factors, one of the most well-known being the nuts' quality.Close-head pistachios are particularly important from an economic, export, and marketing perspective because pistachios are one of the most expensive agricultural products, and their prices are based on their quality.A highly precise and user-friendly system is needed to prevent such losses.Several methods have been developed in recent years by scientists for accurately identifying agricultural products, most notably pistachios (Mahmoudi et al., 2006).Mechanical winnowing is impossible for pistachios because their kernel-close shell and hollow-close shell structures are so similar.Additionally, the carcinogen aflatoxin may be introduced to pistachios using floating techniques (Pearson et al., 1994).
Pistachios have been sorted using a wide variety of methods, including optical, mechanical, electrical, and acoustic approaches.Using machine vision, it is possible to identify pistachios that have been damaged or opened too soon (Pearson, 1996).
As an alternative to more conventional electro-optical and mechanical sorting devices, machine vision can be used to classify pistachio nuts.Interest in using machine vision for sorting and grading agricultural products has increased over the past two decades (Ghazanfari et al., 1998).Rapid growth is being seen in low-priced post-harvesting systems like those that use computer vision for sorting.
In their study, a computer-vision-based intelligent system is developed at a reasonable cost for sorting pistachios (Ghezelbash et al., 2013).The limitations of real-time applications make it challenging to implement many different methods, such as Fourier methods, spectral methods, or active contours.Thus, straightforward methods are appropriate and can benefit from careful offline analysis and refinement.This work aims to develop a classification model that is distinct from existing separation approaches, based on image processing and machine learning, and can provide the required categorization.
Different varieties and growing regions produce pistachios with varying sizes, hues, and flavors.Machine vision can play an important role in this context because of the size of the pistachio, which makes using human resources to do so impractical and a waste of time.(Anonymous, 2022).It is one of the crops that requires human resources to classify and count to assess crop quality based on whether the shell is open or closed.Pistachios are primarily classified depending on the shape of their shell, which can be open-mouth or close-mouth, and the price and worth of these two types differ (Rahimzadeh and Attar, 2022).
Various limits occur in real-time applications, and implementations of many approaches, such as Fourier methods, spectral methods, or active contours, are difficult to use.As a result, simple procedures are appropriate, and they should be thoroughly studied and optimized offline.Image threshold segmentation is a subset of high-speed algorithms used in real-time image processing applications.These are optimized and implemented in this work for a close pistachio sorting system.To the best of our knowledge, minimal work on the installation of pistachio sorting systems employing lowcost and basic methods has been published.A low-cost camera is used, which is bound to produce images with significant noise and low quality.Furthermore, the camera's frame rate is limited, which causes major issues with the segmentation and evaluation of pistachio images in open or closed form.Image processing uses multilevel thresholding to address these issues.

Material
Analyzed pistachios get their name from the Turkish city of Gaziantep, which has variants in numerous languages.The main reason why this fruit is referred to as Antep pistachio in Turkish literature is that the pistachio processing facilities in Türkiye have increased in density in Gaziantep, and production is focused on this location and distributed to other regions from here.In this context, pistachio production occurs in over 40 provinces throughout Türkiye.However, because of temperature and soil characteristics, the Southeastern Anatolia Region accounts for around 95% of output.Şanlıurfa, Gaziantep, Nizip, Siirt, Kahramanmaraş, Adıyaman, and Diyarbakır are among the first to produce in this region.While Gaziantep was the most famous pistachio-production region in Türkiye until 2014, new production areas have relocated to Şanlıurfa since then.Currently, Şanlıurfa and Gaziantep provinces account for around 80% of overall output (Coban et al, 2022).The pistachio market currently has four varieties: Antep, Siirt, Damascus, and Iranian pistachios.World production of the pistachios (first ten countries) in the shell for the 2021 year can be seen in Figure 1 (FAO, 2023).

Data collection and enhancement
The images were captured with a Nikon D800 camera from various situations in one lighting environment.Each scene's background was white, efficient detection of pistachios and to prevent unnecessary noise from anything scattered on it.Each image has a resolution of 5184 x 3456 and a varying number of items.The camera and the pistachio samples are 35 cm apart.Furthermore, black is employed (Figure 2).The objects are then localized using segmentation.A pistachio grain is depicted in Figure 2.

Image analysis
The images were analyzed and measured using ImageJ (Figure 6).This software is available both as a web-based applet and as a downloadable application, both of which work on any computer with a Java 1.4 or later virtual machine.Distribution packages are available for multiple platforms, including Windows, OS X, and Linux.Images in 8, 16, and 32-bit depths can be viewed, edited, analyzed, processed, saved, and printed.It is compatible with numerous image file formats, such as TIFF, GIF, JPEG, BMP, DICOM, FITS, and 'raw.' 'Stacks,' groups of related images that are displayed side by side, are supported.Time-consuming tasks, such as reading image files, can be completed alongside other tasks thanks to the multithreaded nature of the system.For the areas and pixel values that the user specifies, it can also generate statistics.In this research, first, the calibration process was done by using the 'set scale' command on the software with the help of a calibration object with known dimensions (Figure 7).

Classification
Decision Trees and Random Forests machine learning algorithms were used to predict which pistachios would be open-shelled and which would be trash.Metrics for success, including accuracy, sensitivity, specificity, precision, and F1-Score, are established so that the model can be tested and improved upon.Table 1 provides an in-depth look at the algorithms used to determine success.True positive (TP), false positive (FP), true negative (TN), and false negative (FN) stand in for these four possible outcomes.As a precaution against bias and high variance, the '10-fold cross-validation' method was used to evaluate pistachio classification algorithms.The 10-fold cross-validation approach divides the dataset into 10 equal parts, with the first part serving as a test set and the remaining 9 as a training set.After ten iterations, we get our result by averaging each test set's performance against our set of criteria.The best c value was selected for efficacy.Decision tree overall accuracy at 60% train and 40% test data rate were found as 0.99% and Cohen's kappa value was 0.99, at 70% train and 30% test data rate was found as 0.98% and Cohen's kappa value was 0.97, at 80% train and 20% test data rate was found as 0.96% and Cohen's kappa value was 0.94 seen in Table 4.  Random forest overall accuracy at 60% train and 40% test data rates were found as 0.99% and Cohen's kappa value was 0.99, at 70% train and 30% test data rates were found as 100% and Cohen's kappa value was 1, at 80% train and 20% test data rates were found as 100% and Cohen's kappa value was 1 seen in Table 7.

Discussion
In the literature there have been some samples on this subject, Ghazanfari et al. (1997 and1998), for instance, classified pistachio nuts into one of three USDA size grades or as having close shells by using Fourier descriptors and gray-level histogram features of two-dimensional images.Fourier descriptors require a great deal of processing time and are therefore unsuitable for real-time applications.
The method developed by Pearson and Toyofuku (2000) demonstrates how to distinguish between pistachios with open and closed shells from a photograph.Subsequently, an automated machine vision system was developed to detect and remove pistachio nuts with close shells during the processing stages.They said that in their research, the material handling apparatus of the system gently propels nuts past three high-speed line-scan cameras.Digital signal processing boards use camera signals to extract close and open shell pistachio-specific characteristics.This machine vision system can distinguish between open-shell and close-shell nuts with an accuracy of approximately 95% after two iterations.Pearson and Slaughter (1996) have worked on the detection of early open pistachio nuts using machine vision.They stated that by integrating unhulled nut cross-sectional area with adjacent profile data, they accurately identified 100% of the early open nuts and 99% of the normal nuts out of a total of 180 nuts evaluated.Ince et al. (2008) utilized a double tree un-decimated wavelet transform to classify close and open-shell (Turkish) pistachio nuts.Their proposed method utilized a small number of characteristics, yet still achieved 91.5% accuracy in their validation set.In addition, they emphasize that an earlier method based on maximum signal amplitude, absolute integration, and gradient characteristics achieved 82% classification accuracy on the same dataset.The results demonstrate the viability of classifying open and close-shell Turkish pistachios based on the time-frequency information extracted from impact acoustics.Ghezelbash et al. (2013) developed and evaluated an inexpensive computer vision system for sorting pistachio nuts with close shells.To identify pistachios with close shells, their system captures three-dimensional images of the nuts using two flat mirrors and an inexpensive camera.In the three tests, the average removal accuracy for open pistachio nuts was 92.7%, while it was 86.7% for closed pistachio nuts.Ozkan et al. (2021) developed an enhanced k-NN classifier to classify pistachio species.Using experimental data, the classification success rate of the proposed method was determined to be 94.18%.They provide a high-performance classification method that facilitates the economic benefits of pistachio species separation in response to a critical need in the industry.
Deep learning research has also been conducted on this topic; for instance, Farazi et al. (2017) used a convolutional neural network's transmitted mid-level picture representation to sort pistachios using machine vision.They discovered that across all test photos, their model with GoogleNet transferred weights achieved a final average accuracy of 99% for corrected classified items, implying faultless classification.
Research on pistachio species categorization and analysis using pre-trained deep-learning models was conducted by Singh et al. (2022).With the AlexNet model, they achieved a success rate of 94.42%, with the VGG16 model coming in at 98.54 %, and with the VGG19 model coming in at 98.14%.Aktaş et al. (2022) examined the impact of different datasets on accuracy in pistachio deep learning categorization.They imply that the test accuracy was computed as 100% when training and testing the AlexNet structure with this desktop dataset.
Based on experimental data, the proposed method outperformed the research literature with a classification success rate of 100% using 70% train data and 30% test data, and 80% train data and 20% test data using Random Forest prediction.

Conclusion
Close-shell pistachios may by consumers because they are difficult to open and may contain immature kernels.Consequently, it is essential to differentiate them from open-shell pistachio nuts.To separate close-shell pistachios from open-shell pistachios, different systems are utilized (Ince et al., 2008).Moreover, according to Pearson (2001), mechanical devices incorrectly classify 5 to 10% of all open-shell U.S. pistachios as having a close shell, resulting in between $3.75 million and $7.50 million in annual lost revenue.Therefore, the sector requires categorization systems with a high degree of precision.
Consequently, a low-cost technique for sorting closed pistachios was conceived and implemented considering the research literature.Machine learning algorithms were used for the evaluations.About this issue, Sharma and Dutta (2023) stress the importance of machine learning algorithms in the evaluations of agricultural applications.Throughout the training and testing phases, it was determined that the technology is particularly adaptable to different mouth types.Future work in this field can be accomplished by updating the entire subsystem, including the feeder, exposing, separator, lighting, and camera, as well as the optimization and image processing algorithms.Also, updating the optimization and image processing algorithms with further studies will give a chance for better image processing and analysis applications.

Figure 1 .
Figure 1.World production of the pistachios (first ten countries) in the shell for the 2021 year (FAO, 2023).

Figure 3 .
Figure 3. Live view of Antep pistachios on digiCamControl image capture software.

Figure 4 .
Figure 4. Captured view of Antep pistachios on digiCamControl image capture software.

Figure 5 .
Figure 5. Classification of pistachios based on mouth states.

Figure 6 .
Figure 6.The interface of the ImageJ software.

Figure 8 .
Figure 8. Set scale command interface with a pistachio and calibration bar.

Figure 10 .
Figure 10.Mouth measurement of a pistachio with ImageJ software.

Figure 11 .
Figure 11.KNIME flowchart of Decision Tree and Random Forest machine learning algorithms.

Figure 12 -
Figure 12-17 shows class statistics and overall statistics at 60-70-80% train and 40-30-20% test data rates, in addition to the Decision Tree and Random Forest confusion matrices.

Figure 12 .
Figure 12.Decision tree view of pistachio mouth states at 60% train and 40% test data rate.

Figure 13 .
Figure 13.Random forest tree view of pistachio mouth states at 60% train and 40% test data rate.

Figure 14 .
Figure 14.Decision tree of pistachio mouth states at 70% train and 30% test data rate.

Figure 16 .
Figure 16.Decision tree view of pistachio mouth states at 80% train and 20% test data rate.

Table 1 .
Also, Decision Tree and Random Forest confusion matrixes, class statistics, and overall statistics at 60-70-80% train and 40-30-20% test data rates were presented in Table 2-7.The proposed approach attained the best classification success rate of 100% at 70% train and 30% test, and, 80% train and 20% test data rate with Random Forest prediction, according to the experimental data.

Table 1 .
Some biological parameters of pistachios

Table 4 .
Decision tree overall statistics table at 60% train and 40% test, 70% train and 30% test, and 80% train and 20% test data rate

Table 5 .
Random forest confusion matrix at 60% train and 40% test, 70% train and 30% test, and 80% train and 20% test data rate

Table 6 .
Random forest class statistics table at 60% train and 40% test, 70% train and 30% test, and 80% train and 20% test data rate

Table 7 .
Random forest overall statistics table at 60% train and 40% test, 70% train and 30% test, and 80% train and 20% test data rate