Identification of Rice Varieties Using Machine Learning Algorithms

Rice, which has the highest production and consumption rates worldwide, is among the main nutrients in terms of being economical and nutritious in our country as well. Rice goes through some stages of production from the field to the dinner tables. The cleaning phase is the separation of rice from unwanted materials. During the classification phase, solid ones and broken ones are separated and calibration operations are performed. Finally, in the process of extraction based on color features, the striped and stained ones other than the whiteness on the surface of the rice grain are separated. In this paper, five different varieties of rice belonging to the same trademark were selected to carry out classification operations using morphological, shape and color features. A total of 75,000 rice grain images, including 15,000 for each varieties, were obtained. The images were pre-processed using MATLAB software and prepared for feature extraction. Using a combination of 12 morphological, 4 shape features and 90 color features obtained from five different color spaces, a total of 106 features were extracted from the images. For classification, models were created with algorithms using machine learning techniques of k-nearest neighbor, decision tree, logistic regression, multilayer perceptron, random forest and support vector machines. With these models, performance measurement values were obtained for feature sets of 12, 16, 90 and 106. Among the models, the success of the algorithms with the highest average classification accuracy was achieved 97.99% with random forest for morphological features. 98.04% were obtained with random forest for morphological and shape features. It was achieved with logistic regression as 99.25% for color features. Finally, 99.91% was obtained with multilayer perceptron for morphological, shape and color features. When the results are examined, it is observed that with the addition of each new feature, the success of classification increases. Based on the performance measurement values obtained, it is possible to say that the study achieved success in classifying rice varieties.


Introduction
Looking at the production values of grain products around the world, the most important product after wheat and corn is rice. Rice is a grain product that is quite rich in carbohydrates and starch. In addition, it is of great importance in human nutrition in Turkey as it is in the world in terms of being nutritious and economical, and it is also widely used in the industrial field. Different quality criteria are available for rice varieties produced in Turkey. These criteria are physical appearance, cooking features, taste and aroma features, as well as efficiency (Tipi et al. 2009). Determining physical features from quality criteria can be expensive and unreliable when performed with traditional manual rice seed classification processes. Because human decisions are inconsistent, subjective and slow. Machine vision systems can be an alternative to automated systems, which is a non-destructive, cost-effective, fast and accurate technique.
Through studies in recent years using machine vision systems and image processing techniques on grain products, it is seen that the products are examined in terms of many physical features such as color, texture, quality, and size. Studies on grain varieties in the literature are examined and summarized below.
The studies without using color features summarized as follow. Digital image analysis of ground rice was performed by Yadav & Jindal (2001) to check whiteness and determine the percentage of broken seeds. The length, perimeter and shape features of the rice grain were extracted and their quantities were calculated. Dubey et al. (2006) used 45 morphological features for artificial neural network-based classification. An increase in the number of features in the classification has been seen to increase the success rate. They have achieved approximately 88% accuracy for all grains as classification accuracy. Zapotoczny et al. (2008) mentioned the utility of morphological features in the classification of five different barley varieties. In the study, they extracted 74 morphological features of each barley variety. They used principal component analysis (PCA), linear discriminant analysis (LDA) and nonlinear discriminant analysis (NDA) as classification methods. As a result, they concluded that the method in which they used morphological features could be successfully used to identify barley varieties. They also stated that the LDA is the best method in the classification methods. Aggarwal & Mohan (2010) performed aspect ratio analysis using image processing technique for grain quality of rice. The aim of the analysis was to examine the mixtures by taking samples from three different classes (full, semi and broken) sold in markets and priced according to their size and to determine the reference aspect ratio in the market. An automatic image-receiving tape system was designed by OuYang et al. (2010) using image processing technique to distinguish between five different rice seeds. Using the back propagation classification (BP-ANN), they obtained an average accuracy rate of 86.65% for five different rice varieties. Abirami et al. (2014) used image processing and neural network pattern recognition techniques to classify Basmati rice grains. Various morphological features were extracted from the images taken with the help of the camera and the neural network was classified by pattern recognition. As a result of the classification, they achieved an accuracy rate of 98.7%. Sethy & Chatterjee (2018) classified the geometric and texture features of 6 pieces rice varieties using the multi-class support vector machine (M-SVM) algorithm. As a result of the classification, they achieved 92% accuracy. Chen et al. (2019) developed a machine vision system to study broken, calcareous, stained and defective rice grains using morphological features on images of red Indica rice. They used support vector machine (SVM) for classification and achieved results of 99.3%, 96.3% and 93.6%, respectively, as the accuracy of recognizing broken, calcareous, stained and defective rice grains. Koklu & Ozkan (2020) performed classification operations using morphological and shape features in images of seven different dry bean varieties. Classification models have been created using multi-layer perceptron (MLP), SVM, k-nearest neighbors (kNN) and decision tree (DT) machine learning methods. The SVM model achieved the highest classification accuracy of 93.13%.
The studies with using color features summarized as follow. Visen et al. (2003) using image processing techniques and an artificial neural network, they obtained color images of five types of grains: barley, oats, rye, wheat, and durum wheat. Also, they developed algorithms to analyze these images. They developed an artificial neural network-based classifier to identify unknown grains through more than 150 color and textural features of the resulting images. They have achieved over 90% accuracy rate for all grain types in identifying unknown grains. Demirbas & Dursun (2007) aimed to determine the morphological features of 13 different wheat varieties using image processing technique. Images were evaluated using UTHSCSA Image Tool version 3.0 as image processing program. As a result, due to the close proximity of the measurement results obtained by manual and image processing, they stated that image processing technique can be used to determine some of the physical features of wheat grains. Silva & Sonnadara (2013) used an artificial neural network to classify rice varieties. In the study, they developed an algorithm to extract 13 morphological features, 6 color features and 15 texture features using images from 9 different rice varieties. For these features, they have made different classifications, separately and together. As a result of classification, it was observed that texture features, rather than morphological and color features, provide a higher success rate in separate classifications. As a result of the classification, which is a combination of all features, the accuracy rate was achieved as 92%. Kaur & Singh (2013) have studied on a machine algorithm for rice classification using multi-class support vector machines. They have classified rice grains using their shape features, percentages and opaque state and have achieved an accuracy rate of more than 86%. Digital images of 13 rice varieties in Iran in three different forms were analyzed by Abbaspour-Gilandeh et al. (2020) with pre-processing and segmentation using the MATLAB application. Ninety two features were extracted for each rice variety, including 60 colors, 14 morphological and 18 texture features. The least significant difference (LSD) test was performed to obtain a more accurate comparison between varieties. PCA has been used to reduce data sizes and focus on the most effective components. Using discriminant analysis (DA), they achieved classification accuracy of 89.2%, 87.7% and 83.1% for paddy, brown rice and white rice, respectively.
In the literature, it has been studied to obtain product features using morphological features as well as shape and color features using various image processing techniques in images obtained from different grain products. In addition, classification processes were carried out using different machine learning methods with the help of these features. In this study, morphological, shape and color features were extracted for non-destructive, fast and accurate classification of rice varieties. The resulting features were used as inputs to perform classification operations with machine learning methods. In order to see the effect of the resulting features on the classification result, these features were combined, respectively, and the results were examined in detail. The contribution of the features obtained in this way to the classification processes has been interpreted.

Material and Methods
The aim of this study is to extract morphological features, shape features and color features by obtaining images from 5 different rice varieties. It is also to perform classification operations of the obtained features using various artificial intelligence techniques. Figure 1 shows the classification flow chart.

Image acquisition
In order to obtain images of the rice used in the study, the mechanism given in Figure 2 was used. A camera with an Ikegami brand CCD imaging sensor was used to capture the image. The camera used for study has 2.2 megapixels, 2048 × 1088 resolution and full resolution at a maximum frame rate of 53.7 fps. Features such as white balance and backlight correction are available. It is powered by 12V DC voltage and has power consumption below 4.5 W (Ikegami 2020).
The camera used in the study was placed on a closed box with a lighting device inside and a structure to prevent light from receiving from the external environment. Box background color is selected as black for easy processing of the image. The box sizes were designed so that images can be captured from an area 14 cm wide and 18 cm length. The height of the camera was set to 15 cm. The resulting images were recorded by transferring them to computer.

Image processing
In order to perform feature extraction and classification operations in the most accurate way during image processing phase, preprocessing operations related to images were described. Image processing was carried out with the help of MATLAB software. Images taken from the camera are primarily converted to grayscale images. It was then converted to a binary image using the global threshold level of the grayscale image with the help of the otsu method (Kurita et al. 1992). Unwanted objects on the resulting binary images have been removed and prepared for the feature extraction stage by applying the open process. Figure 3 shows the stages of image preprocessing.

Feature extraction
In the study, 12 morphological features using MATLAB software, 4 shape features obtained using morphological features and 90 color features obtained using five different color spaces were extracted.
Morphological and shape features were obtained using MATLAB regionprops function components. Shape features are calculated using area, major axis, and minor axis lengths from morphological features. The resulting feature values refer to the number of pixels of each rice grain. List of morphological features is given in Table 1 and list of shape features in Table 2 (Pazoki et al. 2014). Is the ratio of pixels in the convex body to pixels in the rice grain region. In Equation 2, the calculation formula is given.
The grain of rice gives the perimeter boundary length of. 8 Convex_Area (CA) The number of pixels in the smallest convex polygon that can accommodate the rice grain area. 3

Major_Axis_Length (L)
The longest line that can be drawn on a grain of rice. 9 Extent (Ex) The ratio of pixels in the bounding box to pixels in the rice grain region. 4

Minor_Axis_Length (l)
The longest line on a grain of rice that can be drawn perpendicular to the major axis.

Aspect_Ratio (AR)
It is calculated by dividing the the major axis length by the the minor axis length. The calculation formula is given in Equation 3.

Eccentricity (E)
It gives the eccentricity of the circle, which has the same moments as the region.

Roundness (R)
It is calculated by using area and perimeter. The calculation formula is given in Equation 4.
It is the diameter of a circle with the same area as the area of the rice grain. The calculation formula for the equivalent diameter is given in Equation 1.
It is calculated by dividing equivalent diameter by the major axis length. The calculation formula is given in Equation 5. It is calculated by dividing the major axis length by the area. The calculation formula is given in Equation 6.

= (5)
It is calculated by dividing the minor axis length by the area. The calculation formula is given in Equation 7.
The calculation formula is as given in Equation 8.
The calculation formula is as given in Equation 9.
Color (RGB) images of rice grains used in the study were converted from RGB color spaces to HSV, L*a*b*, YCbCr and XYZ color spaces using MATLAB software. Conversion formulas and explanation are given in Table 3 (Chaudhary et al. 2012;Pazoki et al. 2014).

No
Explanation Formula 1 RGB-HSV Conversion; The HSV color space consists of three parameters: color essence-tone (H), saturation (S), and value (V).
2 RGB-L*a*b* Conversion; In the L*a*b* color space, the value L* denotes lightness, 0 denotes black and 100 denotes white. The value a* refers to red and green. 3

RGB-YCbCr Conversion;
The YCbCr color space consists of brightness (Y), blue difference (Cb), and red difference (Cr) components. 4

RGB-XYZ Conversion;
In the XYZ Color Space, X denotes red, Z denotes blue, and the Y component also denotes brightness.
After the conversion process, the features of the color spaces are duplicated using the average density (MeanIntensity) and the pixel value (PixelValue) components using the regionprops function in MATLAB. Using RGB, HSV, L*a*b*, YCbCr and XYZ color spaces, a total of 90 color feature were extracted with the components of mean, standard deviation, skewness, kurtosis, entropy and wavelet decomposition for each color channel (Arefi et al. 2011;Kaya & Saritas 2019). Explanations of the components applied to the color features are given in Table 4, and the list of the resulting color features is given in Table 5.

No
Feature Explanation The mean density value. (N variable vector, represent X input data). The calculation formula is given in Equation 24.
Returns the standard deviation of pixel values. The standard deviation is a square root of the variance (V). The calculation formulas are given below. 3

Skewness (Sk)
Returns the skewness value of the pixel values. The calculation formula is given in equation 27.
Returns the kurtosis value of pixel values. The kurtosis calculation formula is given in Equation 28.
Returns the entropy of pixel values. Entropy is a statistical measurement used to characterize the image texture. The entropy calculation formula is given in Equation 29.
Wavelet Decomposition Using the two-dimensional wavelet, it returns the wavelet separation level of the matrix from the pixel value. The WaveDec2 function has been used and the wavelet order DB4 has been selected.

Cross validation
In data mining and artificial intelligence techniques, where model development data is scarce, the most common procedure that can be used to check the model's generalization ability is the k-fold cross validation method (Singh & Panda 2011).
Cross validation is an error estimation method developed to improve the reliability of classification. Cross validation works by dividing the dataset so that it is random into the number of subsets set for training and testing. One of the subsets is accepted as a test set and the system is trained with the remaining sets. This process is repeated up to the number k and the system is tested (Browne 2000). Figure 4 shows the working logic of cross validation.

Figure 4-The working logic of cross validation
In the example given in Figure 4, the number of iterations (k) was selected as 10. In this example, the dataset was divided into 10 sections. Nine sections were taken sequentially as training data, and one was used as test data. The process repeats for all subsets and the system test was completed (Berrar 2019).

Kappa test
The Kappa test is a statistical method used to measure reliability by looking at harmony between two or more observers (Kilic 2015). Kappa coefficient values can vary between -1 and + 1. It can be interpreted as being completely compatible for harmony between observers when the value is +1, depending on luck when it is 0, and completely inverse of harmony when it is -1. In Table 6, the interpretation table of the kappa coefficient value ranges is given (Landis & Koch 1977;Kilic 2015).  Figure 5 and technical information about the varieties is given in Table 7.  Basmati 8.5-11.5 3.5-4.5 507 Ipsala 9-11 4-5.5 425 Jasmine 6.5-10 2.5-3.5 547 Karacadag 4.5-6 3-4 513 In our study, 15,000 images of rice grains belonging to each rice variety were obtained. In total, studies were carried out on data belonging to 75,000 grains of rice (Cinar 2019).

Performance evaluation
Creating a new model required for classification problems or using existing models and achieving success on this model was calculated by the number of accurate estimates. This affects the accuracy of the classification rather than the estimation of whether the model is good or not. The confusion matrix is therefore used to explain predictive assessments of classification. It is matrix confusion matrix that provides information about actual classes with predicted classes performed by a classification model on test data (Cataloluk 2012;Cinar & Koklu 2019). In Table 8, the confusion matrix used for binary classification is given, and in Table 9, the confusion matrix used for multiclass classification is given (Hossin & Sulaiman 2015).

Prediction Class
Positive Negative

Actual Class
Positive True positive (tp) False negative (fn)

Negative
False positive (fp) True negative (tn) The accuracy of a classification can be evaluated by calculating the number of correctly recognized (true positives) class instances, the number of correctly recognized instances that do not belong to the class (true negatives), and instances that are incorrectly assigned to the class (false positives) or are not recognized as class (false negatives) instances (Sokolova & Lapalme 2009).
Calculation formulas for success criteria such as accuracy, error rate, recall, specificity, precision and F1 score, were calculated using the confusion matrix for binary classification performance measurements, and are given in Calculation formulas for average accuracy, average recall, average accuracy, average error rate, and average F1-score were calculated using the confusion matrix for multi-class classification performance measurements, and are given in Table 11 (Hossin & Sulaiman 2015).

K-Nearest neighbor (K-NN)
The K-NN method is a nonparametric learning algorithm. K-NN uses the euclidean distance as a parameter in the name of classification of the dataset, where K represents the number of neighbors, to calculate the distance between the data (Kumar et al. 2019).
K-NN is intended to classify sample data whose class is unknown. For this reason, the distance to the sample data is calculated with the pre-classified data set in the training set. Given that there is a certain amount of data to be tested, the test data is processed with all the existing data individually. The test data will have many neighbors that are close to it in terms of all the measured features. For this reason, K pieces of data closest to the test data are selected. As a result, it is said that the tested data belongs to that class if there is more data belonging to which class than the selected data (Richman 2011;Beyaz & Ozturk 2016). For this study, the K value was set to 10.

Decision tree (DT)
DT is one of the first classification methods that comes to mind along with neural networks in data mining. If DT is generally thought as a tree diagram, it branches so that it has a classification query on each of its branches and nodes (Safavian & Landgrebe 1991).
DT's features in dealing with complex problems and their inferences in logical classification rules are seen as advantages (Amor et al. 2006). In addition, DT's integration into databases is easy and their reliability is high, making it stand out among other classification models.

Logistic regression (LR)
LR is one of the commonly used statistical models. In LR, the dependent variable is estimated from one or more variables. LR clarifies the relationship between dependent variables and independent variables. There is no need to create normal distribution of variables in LR. Because the values envisaged in the LR are probabilities, LR is limited to 0 and 1. This is because LR predicts its probability, not itself, in the results. (Cruyff et al. 2016;Kalantar et al. 2018).

Multilayer perceptron (MLP)
Today, many artificial neural network models have been developed for use for specific purposes, and MLP is one of the most used of these models. In MLP, the sequence of neurons is in layers, and there is a hidden layer between them, along with two main layers. MLP can contain more than one hidden layer. The input layer, which is the first of the main layers, is the layer where the data is read and contains information about the problem that needs to be solved. The output layer, which is the second main layer, is the layer where classes are defined and outputs are received for information processed in the network. The hidden layer is the layer where intermediate operations are performed on the data between the main layers (Sabanci 2016).
MLP has as many neurons as the number of features, and the data is provided by a flow of data in one direction from the input layer to the output layer. In addition, it is possible to monitor and modify the network structure during the training period (Arora 2012). In this study, there are 4 hidden layers and also the sigmoid activation function was used.

Random forest (RF)
RF is a classifier consisting of multiple DT's. To make a new classification, each DT provides a classification for the inputs. After that, RF evaluates the classifications and selects the estimate that with the most votes. RF has the ability to manage a large number of variables in a dataset. It is also quite successful at predicting incomplete data. The biggest drawback of RF is its lack of repeatability. Also, the final model and subsequent results are difficult to interpret. This is also due to the fact that it contains many independent decision trees (Oshiro et al. 2012).

Support vector machine (SVM)
SVM is a kernel-based method that creates a hyper plane for classification and regressions. Different kernel functions are used in SVM models. In this study, classification was made using the polynomial kernel function.
SVM has the ability to classify data in the form of linear in two-dimensional space, planar in three-dimensional space and hyper plane in multidimensional space with separation mechanisms (Abhang et al. 2016). SVM performs the classification process by finding the best hyper plane that separates the data belonging to the classes.
SVMs have features similar to other classification algorithms. It is especially similar to neural networks, but more similar to the K-NN algorithm. Like the K-NN algorithm, SVM determines its neighbors based on sample data presented to the algorithm and assumes that estimates are made for new data (Shi et al. 2011;Abhang et al. 2016).

Results
From a total of 75,000 images of rice grains belonging to the rice varieties used in the study, 12 morphological features were extracted from the features found in the list of morphological features given in Table 1. Classification operations were performed using K-NN, DT, LR, MLP, RF and SVM algorithms on the data of 12 features obtained. In addition to 12 morphological features to increase classification accuracy, 4 shape features given in Table 2 were added and classification operations were performed using K-NN, DT, LR, MLP, RF and SVM algorithms on data belonging to a total of 16 features. Given the results we have obtained when we examine the studies conducted in the literature, it is thought that success will increase even more when morphological features, shape features and color features are evaluated together. For this reason, 90 color features have been extracted from the color feature list given in Table 5. Color features were evaluated primarily independently of morphological and shape features, and classification operations were performed using K-NN, DT, LR, MLP, RF and SVM algorithms on data belonging to 90 features. Then, a total of 106 features were extracted, which evaluated the morphological, shape and color features obtained together, and classification operations were performed using k-NN, DT, LR, MLP, RF and SVM algorithms.
In the study, the confusion matrix and performance measurement values of classification results obtained from algorithms for 12 morphological, 16 morphological and shape, 90 colors and 106 morphological, shape and color feature data are given in Table 12, respectively. For all algorithms used in the study, accuracy, error, precision, recall and F1-score average performance measurements and kappa coefficient values obtained by evaluating only morphological features using confusion matrix are given in Table 13. When the average performance measurement values given in Table 13 are examined, it is seen that the classification accuracy for all algorithms is above 97%. It seems that the best classification accuracy belongs to the random forest algorithm with 97.99%. The lowest classification accuracy belongs to the support vector machine algorithm with 97.02%.
For random forest algorithm with the best classification accuracy, the confusion matrix in which morphological features are evaluated is given in Table 12. When the table is examined, the accuracy rates of Arborio, Basmati, Ipsala, Jasmine and Karacadag rice varieties are respectively, 96.61%, 98.09%, 99.53%, 97.98%, and 97.72%. Figure 6 shows the accuracy rates of classification algorithms derived from morphological features for rice varieties used in the study.

Figure 6-Accuracy rates of classification algorithms obtained from morphological features for all rice varieties used in the study
In the random forest algorithm, Ipsala rice variety, which has the highest accuracy rate among varieties, also reaches the highest accuracy rates in other algorithms. The Arborio variety, on the other hand, has a lower accuracy rate than other varieties. When Table 14 was examined, 72,925 pieces from K-NN algorithm, 73,359 pieces from DT algorithm, 73,370 pieces from LR algorithm, 73,132 pieces from MLP algorithm, 73,528 pieces from RF algorithm and 73,067 pieces from SVM algorithm rice grains were correctly classified.
For all algorithms used in the study, accuracy, error, precision, recall and F1-score average performance measurements and kappa coefficient values obtained by evaluating together of morphological and shape features using confusion matrix are given in Table 15. When the average performance measurement values given in Table 15 are examined, it is seen that the classification accuracy for all algorithms is above 97%. It seems that the best classification accuracy belongs to the random forest algorithm with 98.04%. The lowest classification accuracy belongs to the k-nearest neighbor algorithm with 97.23%.
For random forest algorithm with the best classification accuracy, the confusion matrix in which morphological and shape features are evaluated is given in Table 14. When the table is examined, the accuracy rates of Arborio, Basmati, Ipsala, Jasmine and Karacadag rice varieties are respectively, 96.72%, 98.11%, 99.58%, 97.96%, 97.82%. Figure 7 shows the accuracy rates of classification algorithms derived from morphological and shape features for rice varieties used in the study.

Figure 7-Accuracy rates of classification algorithms obtained from morphological and shape features for all rice varieties used in the study
Looking at the classification accuracy rates obtained from morphological and shape features, it seems that the Ipsala rice variety has the highest accuracy rate among the varieties. The Arborio variety, on the other hand, has a lower accuracy rate than other rice varieties.  For all algorithms used in the study, accuracy, error, precision, recall and F1-score average performance measurements and kappa coefficient values obtained by evaluating only color features using confusion matrix are given in Table 17. Looking at the average performance measurement values given in Table 17, it seems that the best classification accuracy belongs to the 99.25% logistic regression algorithm. The lowest classification accuracy belongs to the decision tree algorithm with 97.71%.
For logistic regression algorithm with the best classification accuracy, the confusion matrix in which color features are evaluated is given in Table 16. When the table is examined, the accuracy rates of Arborio, Basmati, Ipsala, Jasmine and Karacadag rice varieties are respectively, 98.95%, 99.06%, 100%, 99.10%, 99.17%. Figure 8 shows the accuracy rates of classification algorithms derived from color features for rice varieties used in the study.

Figure 8-Accuracy rates of classification algorithms obtained from color features for all rice varieties used in the study
In logistic regression algorithm, Ipsala rice variety, which has 100% accuracy rate among varieties, also reaches the highest accuracy rates in other algorithms. Arborio and Basmati varieties, on the other hand, have a lower accuracy rate than other varieties. For all algorithms used in the study, accuracy, error, precision, recall and F1-score average performance measurements and kappa coefficient values obtained by evaluating morphological, shape and color features together using confusion matrix are given in Table 19. When the average performance measurement values given in Table 19 are examined, it is seen that the classification accuracy for all algorithms is above 99%.
It belongs to multi layer perceptron algorithm with best classification accuracy 99.91%. The lowest classification accuracy belongs to the decision tree algorithm with 99.69%. Looking at the parameters of performance measurements, it seems that the multi-layer perceptron algorithm has the best values. In classification algorithms, the high F1-score value is proof that the algorithm performs well in terms of classification accuracy.
For multi layer perceptron algorithm with the best classification accuracy, the confusion matrix in which morphological, shape and color features are evaluated with together is given in Table 19. When the table is examined, the accuracy rates of Arborio, Basmati, Ipsala, Jasmine and Karacadag rice varieties are respectively, 99.81%, 99.99%, 100%, 99.91%, 99.92%. Figure 9 shows the accuracy rates of classification algorithms obtained for rice varieties used in the study.

Figure 9-Accuracy rates of classification algorithms obtained from morphological, shape and color features for all rice varieties used in the study
In the multi-layer perceptron algorithm, Ipsala rice variety, which has a 100% accuracy rate among varieties, also reaches the highest accuracy rates in other algorithms. Arborio and Basmati varieties, on the other hand, have a lower accuracy rate than other varieties.

Conclusions
In this study, a total of 75,000 images of rice grains were obtained from 5 different rice varieties for the classification of rice grains. These images were pre-processed with the help of MATLAB software and were cleared of unwanted materials that may be present on the image and prepared for the feature extraction stage.
Firstly, 12 morphological features were extracted on the images that had been pre-processed before the classification phase. Afterwards, in addition to morphological features, 4 shape features were added and a total of 16 morphological and shape features were obtained. In addition, a total of 90 color features obtained from 5 different color space were extracted and a total of 106 features were obtained where morphological, shape and color features were evaluated together. Morphological features were obtained in MATLAB software using regionprops components, and shape features were obtained using these morphological features. RGB, HSV, L*a*b, YCbCr and XYZ color spaces were used for color features. With MATLAB software, conversion operations were performed to other color spaces using pixel values for each RGB image. After color conversion, a total of 90 color features were obtained for 5 color spaces using mean, standard deviation, skewness, kurtosis, entropy and wavelet decomposition components.
K-NN, DT, LR, MLP, RF and SVM algorithms, which are the most commonly used artificial intelligence techniques, were used for classification. Confusion Matrix inferences of algorithms were made and performance evaluation was performed. The number of cross-validation iteration folds used to control the generalization ability of algorithms was selected as 10.
The classification accuracy belonging to algorithms are given in Table 20 for morphological features, shape features evaluated with morphological features together, color features, and finally, shape and color features evaluated with morphological features together. Over 97% success was achieved in all algorithms by evaluating only morphological features. The highest classification accuracy belongs to the random forest algorithm with 97.99%. In order to increase the success obtained from morphological features, 16 pieces morphological and shape features were extracted by adding shape features. 98.04% accuracy was achieved by random forest algorithm as the highest classification accuracy. When the studies carried out in the literature were examined, it is thought that the success will increase even more after the classifications performed by adding color features in addition to morphological and shape features. For this reason, 90 color features were extracted from the color images of rice grains. 99.25% accuracy was obtained with logistics regression algorithm as the highest classification accuracy. Finally, a total of 106 features of morphological, shape and color features together were extracted, resulting in a classification accuracy of over 99% in all algorithms. The highest classification accuracy belongs to the multi-layer perceptron algorithm with 99.91%. F1-score values obtained from classification algorithms appear to be high. This is proof that the algorithms used perform well in terms of classification accuracy. In addition, when examining the kappa coefficient values used to measure the reliability of classification algorithms, it is possible to reach an interpretation that a very good level of compliance is achieved. In our classification results, it can be seen that the increase in the quality and number of features used in the study contributed positively to the success of classification. When the confusion matrix and performance measurement values of the algorithms used were examined, it was seen that the Ipsala variety reaches the highest performance values in all feature sets. The Arborio variety, on the other hand, has a lower accuracy rate than other varieties. The results show that this study can be successfully used to classify various varieties of rice.

Discussion
With 106 features used in the study, feature extractions can be made on other rice varieties. Using the data obtained, a machine can be designed that can perform calibration operations or separation of unwanted materials from varieties by designing an automatic image-taking system to distinguish rice varieties.
In the 106 features used in the study, the classification process can be performed with these features by identifying the features that are decisive. A database can be created for the features of rice by increasing the number of rice varieties. This database can be adapted to a mobile application and made available in the field of Agriculture. Through this application, determination of rice varieties, determination of physical features, etc. information can be accessed instantly.