Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset

Ahmet Çelik

doi:10.35414/akufemubid.1263900

Research Article

Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset

Year 2023, , 1142 - 1149, 30.10.2023

Ahmet Çelik

https://doi.org/10.35414/akufemubid.1263900

Abstract

Machine learning algorithms are widely used in product sorting processes in the food industry. The
attributes of the products are used in the classification process. Attributes vary for each product. In this
study, using the k nearest neighbor (KNN) algorithm, the classification of the wheat groups of Kama,
Rosa and Canada was performed. The Seeds dataset provided in UCI (University of California, Irvine)
machine learning open source data storage was used. There are 70 examples of each wheat class in the
data set. In addition, the classification estimation success of distance metrics and the number of training
data was measured. Each of the wheat samples was randomly selected and a soft X-ray technique was
used to visualize the inner core structure of the wheat in the experimental environment with high
quality. According to the training rates ranging from 50% to 90% of the data set, the classification
success of the KNN algorithm was tested. In the KNN algorithm, the neighborhood values 1, 3 and 5
were selected to affect the classification success. The successes of the Euclidean, Chebyshev,
Manhattan and Mahalanobis distance metric methods of the KNN algorithm were tested according to
each k neighborhood value. According to the results obtained, with the Mahalanobis metric method, a
classification success rate of 0.9924 accuracy was obtained according to the AUC (Area Under the Curve)
success metric by using the neighborhood value of k = 3. In the literature, there is no study comparing
the KNN algorithm, neighborhood values and distance vectors together on food data sets using varying
training and test data. Therefore, it is thought that the study will make an important contribution to
the literature.

Keywords

Machine learning, Classification, Seeds dataset, Distance metric methods, Random sampling, KNN algorithm

References

Akbaş, Y., Berber, T., 2020. Yanık Görüntülerinin Bulanık Kümelenmesinde Uzaklık Ölçülerinin Başarımlarının Değerlendirilmesi. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 22, 639-647.
Bilgiçli, N., Soylu, S., 2017. Buğday ve Un Kalitesinin Sektörel Açıdan Değerlendirilmesi. Bahri Dağdaş Bitkisel Araştırma Dergisi, 5, 58-67.
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S. 2010. A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images. Information Technologies in Biomedicine, Springer-Verlag, Germany, 15-24.
Cheng Z., Yuan L., 2013. The application and research of fault detection based on PC-KNN in semiconductor batch process. 25th Chinese Control and Decision Conference (CCDC), 4209-4214
Cover, T.M., Hart, P.E., 1967. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21-27. Çelik, A., 2022. Improving Iris Dataset Classification Prediction Achievement by Using Optimum k Value of KNN Algorithm. Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 3, 23-30.
Çınar, İ., Koklu, M., 2022. Identification of Rice Varieties Using Machine Learning Algorithms. Journal of Agricultural Sciences, 28, 307-325.
Deivasikamani, G., Akshay, C., Ananthakrishnan, T., Manoj R. C., 2022. Covid Cough Classification using KNN Classification Algorithm. 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 232-237.
Dilki, G., Başar, Ö.D, 2020. İşletmelerin İflas Tahmininde K-en yakın komşu Algoriması Üzerinden Uzaklık Ölçütlerinin Karşılaştırılması. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, 19, 224-233.
Donuk, K., Hanbay, D., 2021. Sınıflandırma Algoritmalarına Dayalı VGG-11 ile Yüzde Duygu Tanıma. Computer Science, 5th International Artificial Intelligence and Data Processing Symposium, 359-365.
Dua, D., Graff, C., 2019. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
Durak, B., 2011. A Classification Algorithm Using Mahalanobis Distance Clustering of Data with Applications on Biomedical Data Sets. Master of Science in Industrial Engineering Department. Middle East Technical University, Ankara, 104.
Eldem, A., 2020. An Application of Deep Neural Network for Classification of Wheat Seeds. European Journal of Science and Technology, 19, 213-220.
Kayabasi, A., Toktas, A., Sabanci, K., Yigit, E., 2018. Automatic classification of agricultural grains: Comparison of neural networks. Neural Netw World. 28, 213-224.
Lal, H., Raja, A., 2015. Seed Classification using Machine Learning Techniques. Journal of Multidisciplinary Engineering Science and Technology (JMEST), 2, 1098-1102.
Margapuri, V., Penumajji, N., Neilsen, M., 2021. Seed Classification Using Synthetic Image Datasets Generated from Low-Altitude UAV Imagery. 20th IEEE International Conference on Machine Learning and Applications (ICMLA 2021), 116-121.
Mladenova, Valova, I., Analysis of the KNN Classifier Distance Metrics for Bulgarian Fake News Detection. 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 1-4.
Özkan, K., Seke, E., Işık, Ş., 2021. Wheat kernels classification using visible-near infrared camera based on deep learning. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 27, 618-626.
Sabancı, K., Akkaya, M., 2016. Classification of Different Wheat Varieties by Using Data Mining Algorithms. International Journal of Intelligent Systems and Applications in Engineering, 4, 40-44.
Silahtaroğlu, G., 2016. Veri madenciliği (Kavram ve algoritmaları). 3. Basım, İstanbul, Türkiye: Papatya Yayıncılık Eğitim, 118-120.
Song, L., Deng, Y.Q., Zhu, Z.L., Hua, H.L., Tao, Z. Z., 2021. A Comprehensive Review on Radiomics and Deep Learning for Nasopharyngeal Carcinoma Imaging. Diagnostics, 11, 1523.
Taunk, K, De, S, Verma, S, Swetapadma, A., 2019. A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS 2019), 1255–1260.
Thirunavukkarasu, K., Singh, A. S., Rai, P., Gupta, S., 2018. Classification of IRIS Dataset using Classification Based KNN Algorithm in Supervised Learning. 2018 4th International Conference on Computing Communication and Automation (ICCCA), 1-4.
Yasar, A., Kaya, E., Saritas, I., 2016. Classification of Wheat Types by Artificial Neural Network. International Journal of Intelligent Systems and Applications in Engineering, 4, 12-15.
https://www.bloomberght.com/tahil-anlasmasi-icin-tarihi-imzalar-atildi-2311295 (20.02.2023).
https://archive.ics.uci.edu/ml/datasets/seeds (15.01.2023).

KNN Algoritması Uzaklık Metrik Yöntemlerinin Buğday Tohumları Veri Seti Üzerinde Sınıflandırma Başarısının Tespit Edilmesi

Year 2023, , 1142 - 1149, 30.10.2023

Ahmet Çelik

https://doi.org/10.35414/akufemubid.1263900

Abstract

Makine öğrenmesi algoritmaları, gıda sektöründe ürün sınıflandırma işlemlerinde yaygın olarak
kullanılmaktadır. Sınıflandırma işleminde ürünlerin öznitelikleri kullanılmaktadır. Öznitelikler her ürüne
göre değişiklik göstermektedir. Bu çalışmada, k en yakın komşu (KNN) algoritması kullanılarak, Kama,
Rosa ve Kanada buğday gruplarının sınıflandırması gerçekleştirilmiştir. UCI (University of California,
Irvine) makine öğrenme açık kaynak veri depolama alanında temin edilen Seeds veri seti kullanılmıştır.
Veri setinde her buğday sınıfına ait 70 örnek mevcuttur. Ayrıca uzaklık metriklerinin ve eğitim veri
sayısının sınıflandırma tahmin başarısı ölçülmüştür. Her bir buğday örneği rastgele seçilerek, deney
ortamında buğdayların iç çekirdek yapısının yüksek kalitede görselleştirilmesi için yumuşak bir X-ışını
tekniği kullanılmıştır. Veri setinin %50 ile %90 arasında değişen eğitim oranlarına göre KNN
algoritmasının sınıflandırma başarısı test edilmiştir. KNN algoritmasında sınıflandırma başarısını etkilen
k komşuluk değeri 1, 3 ve 5 seçilmiştir. Her k komşuluk değerine göre KNN algoritmasının Euclidean,
Chebyshev, Manhattan ve Mahalanobis uzaklık metrik yöntemlerinin başarıları test edilmiştir. Elde
edilen sonuçlara göre Mahalanobis metrik yöntemiyle, k=3 komşuluk değeri kullanılarak,
AUC(Area Under the Curve: Eğri Altındaki Alan) başarı metriğine göre, 0.992 doğrulukta sınıflandırma
başarısı elde edilmiştir. Literatürde, değişen eğitim ve test verileri kullanılarak gıda veri setleri üzerinde,
KNN algoritmasının, komşuluk değerlerinin ve uzaklık vektörlerinin birlikte kıyaslandığı bir çalışmaya
rastlanmamıştır. Bundan dolayı yapılan çalışmanın, literatüre önemli katkı sağlayacağı düşünülmektedir.

Keywords

Makine öğrenmesi, Sınıflandırma, Seeds veri seti, KNN algoritması, Uzaklık metrik yöntemleri, Rastgele örnekleme

References

Akbaş, Y., Berber, T., 2020. Yanık Görüntülerinin Bulanık Kümelenmesinde Uzaklık Ölçülerinin Başarımlarının Değerlendirilmesi. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 22, 639-647.
Bilgiçli, N., Soylu, S., 2017. Buğday ve Un Kalitesinin Sektörel Açıdan Değerlendirilmesi. Bahri Dağdaş Bitkisel Araştırma Dergisi, 5, 58-67.
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S. 2010. A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images. Information Technologies in Biomedicine, Springer-Verlag, Germany, 15-24.
Cheng Z., Yuan L., 2013. The application and research of fault detection based on PC-KNN in semiconductor batch process. 25th Chinese Control and Decision Conference (CCDC), 4209-4214
Cover, T.M., Hart, P.E., 1967. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21-27. Çelik, A., 2022. Improving Iris Dataset Classification Prediction Achievement by Using Optimum k Value of KNN Algorithm. Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 3, 23-30.
Çınar, İ., Koklu, M., 2022. Identification of Rice Varieties Using Machine Learning Algorithms. Journal of Agricultural Sciences, 28, 307-325.
Deivasikamani, G., Akshay, C., Ananthakrishnan, T., Manoj R. C., 2022. Covid Cough Classification using KNN Classification Algorithm. 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 232-237.
Dilki, G., Başar, Ö.D, 2020. İşletmelerin İflas Tahmininde K-en yakın komşu Algoriması Üzerinden Uzaklık Ölçütlerinin Karşılaştırılması. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, 19, 224-233.
Donuk, K., Hanbay, D., 2021. Sınıflandırma Algoritmalarına Dayalı VGG-11 ile Yüzde Duygu Tanıma. Computer Science, 5th International Artificial Intelligence and Data Processing Symposium, 359-365.
Dua, D., Graff, C., 2019. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
Durak, B., 2011. A Classification Algorithm Using Mahalanobis Distance Clustering of Data with Applications on Biomedical Data Sets. Master of Science in Industrial Engineering Department. Middle East Technical University, Ankara, 104.
Eldem, A., 2020. An Application of Deep Neural Network for Classification of Wheat Seeds. European Journal of Science and Technology, 19, 213-220.
Kayabasi, A., Toktas, A., Sabanci, K., Yigit, E., 2018. Automatic classification of agricultural grains: Comparison of neural networks. Neural Netw World. 28, 213-224.
Lal, H., Raja, A., 2015. Seed Classification using Machine Learning Techniques. Journal of Multidisciplinary Engineering Science and Technology (JMEST), 2, 1098-1102.
Margapuri, V., Penumajji, N., Neilsen, M., 2021. Seed Classification Using Synthetic Image Datasets Generated from Low-Altitude UAV Imagery. 20th IEEE International Conference on Machine Learning and Applications (ICMLA 2021), 116-121.
Mladenova, Valova, I., Analysis of the KNN Classifier Distance Metrics for Bulgarian Fake News Detection. 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 1-4.
Özkan, K., Seke, E., Işık, Ş., 2021. Wheat kernels classification using visible-near infrared camera based on deep learning. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 27, 618-626.
Sabancı, K., Akkaya, M., 2016. Classification of Different Wheat Varieties by Using Data Mining Algorithms. International Journal of Intelligent Systems and Applications in Engineering, 4, 40-44.
Silahtaroğlu, G., 2016. Veri madenciliği (Kavram ve algoritmaları). 3. Basım, İstanbul, Türkiye: Papatya Yayıncılık Eğitim, 118-120.
Song, L., Deng, Y.Q., Zhu, Z.L., Hua, H.L., Tao, Z. Z., 2021. A Comprehensive Review on Radiomics and Deep Learning for Nasopharyngeal Carcinoma Imaging. Diagnostics, 11, 1523.
Taunk, K, De, S, Verma, S, Swetapadma, A., 2019. A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS 2019), 1255–1260.
Thirunavukkarasu, K., Singh, A. S., Rai, P., Gupta, S., 2018. Classification of IRIS Dataset using Classification Based KNN Algorithm in Supervised Learning. 2018 4th International Conference on Computing Communication and Automation (ICCCA), 1-4.
Yasar, A., Kaya, E., Saritas, I., 2016. Classification of Wheat Types by Artificial Neural Network. International Journal of Intelligent Systems and Applications in Engineering, 4, 12-15.
https://www.bloomberght.com/tahil-anlasmasi-icin-tarihi-imzalar-atildi-2311295 (20.02.2023).
https://archive.ics.uci.edu/ml/datasets/seeds (15.01.2023).

There are 25 citations in total.

Details

Primary Language	English
Subjects	Artificial Intelligence
Journal Section	Articles
Authors	Ahmet Çelik 0000-0002-6288-3182
Early Pub Date	October 27, 2023
Publication Date	October 30, 2023
Submission Date	March 12, 2023
Published in Issue	Year 2023

Cite

APA	Çelik, A. (2023). Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 23(5), 1142-1149. https://doi.org/10.35414/akufemubid.1263900
AMA	Çelik A. Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. October 2023;23(5):1142-1149. doi:10.35414/akufemubid.1263900
Chicago	Çelik, Ahmet. “Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 23, no. 5 (October 2023): 1142-49. https://doi.org/10.35414/akufemubid.1263900.
EndNote	Çelik A (October 1, 2023) Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 23 5 1142–1149.
IEEE	A. Çelik, “Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 23, no. 5, pp. 1142–1149, 2023, doi: 10.35414/akufemubid.1263900.
ISNAD	Çelik, Ahmet. “Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 23/5 (October 2023), 1142-1149. https://doi.org/10.35414/akufemubid.1263900.
JAMA	Çelik A. Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2023;23:1142–1149.
MLA	Çelik, Ahmet. “Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 23, no. 5, 2023, pp. 1142-9, doi:10.35414/akufemubid.1263900.
Vancouver	Çelik A. Determination of the Classification Success of KNN Algorithm Distance Metric Methods on Wheat Seeds Dataset. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2023;23(5):1142-9.

Article Files

Full Text

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.