An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms

Mustafa Demir; İbrahim Kılıç

doi:10.35414/akufemubid.1153610

Research Article

Sınıflandırma Algoritmalarının Performanslarının Karşılaştırılması için Özellik Seçim Yöntemleri Üzerine Bir Uygulama

Year 2022, , 1307 - 1313, 28.12.2022

Mustafa Demir , İbrahim Kılıç

https://doi.org/10.35414/akufemubid.1153610

Abstract

Bu çalışmada ele alınan bir verinde yer alan çok sayıdaki değişken arasından özellik seçim yöntemleri yardımı ile daha az sayıda ve anlamlı değişkenlerin belirlenmesi amaçlanmıştır. Özellik seçim yöntemleri son yıllarda istatistik bilimi içerisinde büyük önem arz eden etkili ve araştırmacılara büyük kolaylıklar sağlayan yöntemlerdir. Yöntem içerisinde kullanılan tekniğe bağlı olarak farklı sayıda değişkenlerin modele alınmasına sebep olmakla beraber doğru sınıflandırma oranları değişebilmektedir. Bu bağlamda ilgilenilen çok dayıda değişkene sahip bir veri seti içerisindeki değişkenlerin yüksek bir sınıflama yüzdesi ile daha az sayıda yeni değişkenle ifade edilebilmesi zaman, maliyet gibi konularda olumlu katkılar sunmaktadır. Bu çalışmada ele alınan veri setinde yer alan değişkenler öncelikle farklı özellik seçim yöntemleri ile analiz edilerek yeni veri setleri oluşturulmuştur. Daha sonra oluşturulan bu yeni ve farklı sayıda değişken içeren ver setleri, farklı makine öğrenme teknikleri ile analiz edilerek en iyi makine öğrenme tekniği belirlenmiştir. Bu çalışma kronik böbrek hastalığı verileri ele alınarak farklı özellik seçimleri yöntemleri ile veri setinde yer alan değişkenler sınıflandırılmıştır. Çalışma sonuçları incelendiğinde en yüksek sınıflandırma oranı %99.75 ile rassal orman ve çok katmanlı algılayıcı tekniğini içeren korelasyon tabanlı özellik seçimi yönteminden ve yine aynı oran ile k en yakın komşu tekniğini içeren filtre yönteminden elde edilmiştir. Çalışma sonuçları daha önceden aynı veri seti kullanılarak yapılan diğer araştırmalarla karşılaştırıldığında, bu çalışmadan elde edilen doğru sınıflama yüzdesinin diğer çalışmalardan daha yüksek olduğunu göstermektedir.

Keywords

Sınıflandırma yöntemleri, Özellik seçim yöntemi, Makine öğrenmesi, Kronik böbrek hastalığı

References

Akpolat, T. and Utaş, C., 2008. Hemodiyaliz Hekimi El Kitabı, Türk Nefroloji Derneği Yayınları, Ceylan Ofset Samsun, 56-72.
Ay, Ö., 2019. Özellik Seçimi Problemleri İçin Polihedral Konik Fonksiyonlar Temelli Çözüm Yaklaşımı. Yüksek Lisans Tezi, Eskişehir Teknik Üniversitesi Endüstri Mühendisliği Anabilim Dalı, Eskişehir, 43.
Aydın, C., 2018. Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 14, 169–175.
Beyazıt, B.E., 2019. Büyük Veri Problemlerinde Performans Arttırmaya Yönelik Özellik Seçimi ve Boyut İndirgeme Optimizasyonu. Yüksek Lisans Tezi, Gazi Üniversitesi Bilişim Enstitüsü, Ankara, 63.
Charleonnan, P., Fufaung, T., Niyomwong, T., Chokchueypattanakit, W., Suwannawach, S., et al., 2016. Predictive analytics for chronic kidney disease using machine learning techniques, The 2016 Management and Innovation Technology International Conference, Bangkok, Thailand, 81–83.
Chetty, N., Vaisla, K.S. and Sudarsan, S.D., 2015. Role of attributes selection in classification of chronic kidney disease patients, International Conference on Computing, Communication and Security (ICCCS), Le Meridien, Mauritius, 1–6.
Congalton, R.G. and Green, K., 1998, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, first edn. Lewis Publications, Boca Raton, 137.
Çelik, E., Atalay, M. and Kondiloğlu, A., 2016. The diagnosis and estimate chronic kidney disease using the machine learning methods. International Journal of Intelligent Systems and Applications in Engineering. 4, 27–31.
Emel, G.G. and Taşkın, Ç., 2005. Veri Madenciliğinde Karar Ağaçları ve Bir Satış Analizi Uygulaması. Eskişehir Osmangazi Üniversitesi, Sosyal Bilimler Dergisi, 6, 221–229.
Forman, G., 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 3, 1289–1305.
Gazeloğlu, C., 2020. Prediction of heart disease by classifying with feature selection and machine learning methods. Progress in Nutrition 22, 660-670.
Gunarathne, W.H.S.D., Perera, K.D.M. and Kahandawaarachchi, K.A.D.C.P., 2017. Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (CKD). IEEE 17th International Conference on Bioinformatics and Bioengineering, 291–296p.
Karakaş, M., 2020. Sınıflandırma Problemlerinde Özellik Seçimi için Karşıtlık Tabanlı Gri Kurt Optimizasyon Algoritması. Yüksek Lisans Tezi, Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Enstitüsü, Bilecik, 66.
Kılıç, S., 2015. Kappa Test. Journal of Mood Disorders, 5, 142–144.
Koç, İ., 2016. Sınıflandırma Problemlerinde Meta-Sezgisel Optimizasyon Yöntemlerinin Özellik Seçimi ve Ayrıklaştırma Amacıyla Kullanımı. Yüksek Lisans Tezi, Selçuk Üniversitesi Fen Bilimleri Enstitüsü, Konya, 181.
Landis, J.R. and Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Rahman, M.M., Usman, O.L., Muniyandi, R.C., Sahran, S., Mohomed, S. and Razak, R.A., 2020. Otizm Spektrum Bozukluğu için Özellik Seçim ve Sınıflandırmasına Yönelik Makine Öğrenim Yöntemlerinin Gözden Geçirilmesi. Brain Sciences Journal, 10, 1–26.
Takıcı, H., 2018. Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering Computer Sciences, 26, 1–10.
Viera, A.J. and Garrett, J.M., 2005. Understanding interobserver agreement: the kappa statistic. Family Medicine, 3, 360–370.
https://tr.wikipedia.org/wiki/Fleiss%27in_kappa_katsayısı/, (22.06.2021).
https://www.kisa.link/KU44/, (12.02.2021)
https://blog.goldenhelix.com/cross-validation-for-genomic-prediction-in-svs/, (11.03.2021).
https://www.openml.org/a/estimation-procedures/7, (11.03.2021).

An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms

Year 2022, , 1307 - 1313, 28.12.2022

Mustafa Demir , İbrahim Kılıç

https://doi.org/10.35414/akufemubid.1153610

Abstract

In this study, it is aimed to determine fewer and significant variables with the help of feature selection methods among a large number of variables in the data discussed. Feature selection methods are effective methods that have great importance in statistics in recent years and provide great convenience to researchers. Depending on the technique used in the method, different numbers of variables are included in the model, but the correct classification rates may vary. In this context, being able to express the variables in a data set with a large number of variables of interest with a high classification percentage and fewer new variables makes positive contributions to issues such as time and cost. The variables in the data set discussed in this study were firstly analyzed with different feature selection methods and new data sets were created. Afterwards, these new data sets containing different numbers of variables were analyzed with different machine learning techniques and the best machine learning technique was determined. In this study, chronic kidney disease data were handled and the variables in the data set were classified with different feature selection methods. When the results of the study are examined, the highest classification rate with 99.75% was obtained from the correlation-based feature selection method, which includes the random forest and multilayer perceptron technique, and the filter method, which includes the k-nearest neighbor technique, with the same rate. The results of the study show that the percentage of correct classification obtained from this study is higher than that of other studies, when compared with other studies using the same dataset.

Keywords

Clustering methods, Feature selection methods, Machine learning, Chronic kidney disease

References

Akpolat, T. and Utaş, C., 2008. Hemodiyaliz Hekimi El Kitabı, Türk Nefroloji Derneği Yayınları, Ceylan Ofset Samsun, 56-72.
Ay, Ö., 2019. Özellik Seçimi Problemleri İçin Polihedral Konik Fonksiyonlar Temelli Çözüm Yaklaşımı. Yüksek Lisans Tezi, Eskişehir Teknik Üniversitesi Endüstri Mühendisliği Anabilim Dalı, Eskişehir, 43.
Aydın, C., 2018. Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 14, 169–175.
Beyazıt, B.E., 2019. Büyük Veri Problemlerinde Performans Arttırmaya Yönelik Özellik Seçimi ve Boyut İndirgeme Optimizasyonu. Yüksek Lisans Tezi, Gazi Üniversitesi Bilişim Enstitüsü, Ankara, 63.
Charleonnan, P., Fufaung, T., Niyomwong, T., Chokchueypattanakit, W., Suwannawach, S., et al., 2016. Predictive analytics for chronic kidney disease using machine learning techniques, The 2016 Management and Innovation Technology International Conference, Bangkok, Thailand, 81–83.
Chetty, N., Vaisla, K.S. and Sudarsan, S.D., 2015. Role of attributes selection in classification of chronic kidney disease patients, International Conference on Computing, Communication and Security (ICCCS), Le Meridien, Mauritius, 1–6.
Congalton, R.G. and Green, K., 1998, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, first edn. Lewis Publications, Boca Raton, 137.
Çelik, E., Atalay, M. and Kondiloğlu, A., 2016. The diagnosis and estimate chronic kidney disease using the machine learning methods. International Journal of Intelligent Systems and Applications in Engineering. 4, 27–31.
Emel, G.G. and Taşkın, Ç., 2005. Veri Madenciliğinde Karar Ağaçları ve Bir Satış Analizi Uygulaması. Eskişehir Osmangazi Üniversitesi, Sosyal Bilimler Dergisi, 6, 221–229.
Forman, G., 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 3, 1289–1305.
Gazeloğlu, C., 2020. Prediction of heart disease by classifying with feature selection and machine learning methods. Progress in Nutrition 22, 660-670.
Gunarathne, W.H.S.D., Perera, K.D.M. and Kahandawaarachchi, K.A.D.C.P., 2017. Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (CKD). IEEE 17th International Conference on Bioinformatics and Bioengineering, 291–296p.
Karakaş, M., 2020. Sınıflandırma Problemlerinde Özellik Seçimi için Karşıtlık Tabanlı Gri Kurt Optimizasyon Algoritması. Yüksek Lisans Tezi, Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Enstitüsü, Bilecik, 66.
Kılıç, S., 2015. Kappa Test. Journal of Mood Disorders, 5, 142–144.
Koç, İ., 2016. Sınıflandırma Problemlerinde Meta-Sezgisel Optimizasyon Yöntemlerinin Özellik Seçimi ve Ayrıklaştırma Amacıyla Kullanımı. Yüksek Lisans Tezi, Selçuk Üniversitesi Fen Bilimleri Enstitüsü, Konya, 181.
Landis, J.R. and Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Rahman, M.M., Usman, O.L., Muniyandi, R.C., Sahran, S., Mohomed, S. and Razak, R.A., 2020. Otizm Spektrum Bozukluğu için Özellik Seçim ve Sınıflandırmasına Yönelik Makine Öğrenim Yöntemlerinin Gözden Geçirilmesi. Brain Sciences Journal, 10, 1–26.
Takıcı, H., 2018. Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering Computer Sciences, 26, 1–10.
Viera, A.J. and Garrett, J.M., 2005. Understanding interobserver agreement: the kappa statistic. Family Medicine, 3, 360–370.
https://tr.wikipedia.org/wiki/Fleiss%27in_kappa_katsayısı/, (22.06.2021).
https://www.kisa.link/KU44/, (12.02.2021)
https://blog.goldenhelix.com/cross-validation-for-genomic-prediction-in-svs/, (11.03.2021).
https://www.openml.org/a/estimation-procedures/7, (11.03.2021).

There are 23 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Mustafa Demir 0000-0001-5684-7778 İbrahim Kılıç 0000-0003-0595-8771
Publication Date	December 28, 2022
Submission Date	August 3, 2022
Published in Issue	Year 2022

Cite

APA	Demir, M., & Kılıç, İ. (2022). An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 22(6), 1307-1313. https://doi.org/10.35414/akufemubid.1153610
AMA	Demir M, Kılıç İ. An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. December 2022;22(6):1307-1313. doi:10.35414/akufemubid.1153610
Chicago	Demir, Mustafa, and İbrahim Kılıç. “An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 22, no. 6 (December 2022): 1307-13. https://doi.org/10.35414/akufemubid.1153610.
EndNote	Demir M, Kılıç İ (December 1, 2022) An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 22 6 1307–1313.
IEEE	M. Demir and İ. Kılıç, “An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 22, no. 6, pp. 1307–1313, 2022, doi: 10.35414/akufemubid.1153610.
ISNAD	Demir, Mustafa - Kılıç, İbrahim. “An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 22/6 (December 2022), 1307-1313. https://doi.org/10.35414/akufemubid.1153610.
JAMA	Demir M, Kılıç İ. An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2022;22:1307–1313.
MLA	Demir, Mustafa and İbrahim Kılıç. “An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 22, no. 6, 2022, pp. 1307-13, doi:10.35414/akufemubid.1153610.
Vancouver	Demir M, Kılıç İ. An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2022;22(6):1307-13.

Article Files

Full Text

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.