Araştırma Makalesi
BibTex RIS Kaynak Göster

Çoklu Sınıflandırma Problemlerinde Kullanılan Bazı Performans Ölçütlerinin Karşılaştırılması

Yıl 2025, Cilt: 27 Sayı: 1, 22 - 39, 30.06.2025

Öz

Bu araştırmanın amacı, makine öğrenmesinde birden fazla sınıflandırma probleminde kullanılan performans metriklerini karşılaştırmaktır. Bu amaçla 4 farklı sınıflandırma yöntemi kullanılarak farklı senaryolar altında simülasyon çalışması yapılmış ve elde edilen performans metrikleri bu doğrultuda karşılaştırılmıştır. Çalışmada performans metrikleri karşılaştırılırken, sınıflandırma amacıyla kullanılacak veriler 4 faktörün etkisi dikkate alınarak farklı senaryolar altında türetilmiştir. Yanıt değişkeninin 3 farklı kategori sayısı, 5 farklı örneklem büyüklüğü, 3 farklı korelasyon yapısı ve yanıt değişkeninin dengeli ve dengesiz dağılımı dikkate alınarak 90 farklı senaryo oluşturulmuştur. Çoklu sınıflandırma problemlerinde kullanılan Accuray, Kappa ve CramerV metrikleri performans ölçüsü olarak kullanılmıştır. Belirlenen senaryolardaki performans metriklerindeki değişimler tablolar halinde özetlenmiş ve karşılaştırılmıştır. Simülasyon çalışması ile yapılan karşılaştırmalar sonucunda, Kappa performans ölçütünün çok sınıflı sınıflandırma problemlerinde diğer iki metriğe göre daha doğru bir performans metriği olduğu ve yöntemin sınıflandırma başarısı hakkında daha güvenilir bilgi verdiği görülmüştür.

Kaynakça

  • Ballabio, D., Grisoni, F. & Todeschini, R. (2018). Multivariate Comparison of Classification Performance Measures. Chemometrics and Intelligent Laboratory Systems, 174, 33-44. https://doi.org/10.1016/j.chemolab.2017.12.004.
  • Bishop, C. M. (2007). Pattern Recognition and Machine Learning. New York: Springer. ISBN: 0-387- 31073-8.
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
  • Breiman, L. F., Jerome, O. A., Richard, S. J. & Stone, C. (1993). Classification and Regression Trees. New York: Chapman & Hall.
  • Bridge, D. (2013). Classification: K-nearest Neighbours. Online Courses. Retrieved from www.cs.ucc.ie/~dgb/ courses/tai/notes/handout4.pdf, Accessed time: 12.08.2023.
  • Chen, P., Lien, C., Wu, W., Lee, L. & Shaw, J. (2020). Gait-Based Machine Learning for Classifying Patients with Different Types of Mild Cognitive Impairment. Journal of Medical Systems, 44(6),107-120.
  • De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J. & Moguerza, J. M. (2022). General Performance Score for Classification Problems. Applied Intelligence, 52(10), 12049-12063.
  • Dhasaradhan, K. & Jaichandran, R. (2022). Performance Analysis of Machine Learning Algorithms in Heart Disease Prediction. Concurrent Engineering, 30(4), 335-343.
  • Fatourechi, M., Ward, R. K., Mason, S. G., Huggins, J., Schlögl A. & Birch, G. E. (2008).Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets. Seventh International Conference on Machine Learning and Applications 2008, 777-782.
  • Fávero, L.P., Belfiore, P. & Souza, R.F. (2023). Bivariate Descriptive Statistics. In: L. P. Fávero, P. Belfiore & R. F. Souza (Eds.), Data Science, Analytics and Machine Learning with R (pp. 63-71). Academic Press. https://doi.org/10.1016/B978-0-12-824271-1.00003-2.
  • Ferri, C., Hernández-Orallo, J. & Modroiu, R. (2009). An Experimental Comparison of Performance Measures for Classification. Pattern Recognition Letters, 30(1), 27–38.
  • Folorunso, S. O., Awotunde, J. B., Adeniyi, E. A., Abiodun, K. M. & Ayo, F. E. (2022). Heart Disease Classification Using Machine Learning Models. In: S. Misra, J. Oluranti, R. Damaševičius & R. Maskeliunas (Eds.), Communications in Computer and Information Science (pp. 35-49). Springer, Cham. https://doi.org/10.1007/978-3-030-95630-1_3
  • Gösgens, M., Zhiyanov, A., Tikhonov, A. & Prokhorenkova, L. (2021). Good Classification Measures and How to Find Them. 35th Conference on Neural Information Processing Systems 2021, 1-12.
  • Grandini, M., Bagli, E. & Visani, G. (2020) Metrics for Multi-Class Classification: An Overview. arXiv 2020, (1-17). https://doi.org/10.48550/arXiv.2008.05756.
  • Gu, Q., Zhu, L. & Cai, Z. (2009). Evaluation Measures of the Classification Performance of Imbalanced Data Sets. In: Z. Cai, Z. Li, Z. Kang & Y. Liu (Eds.), Communications in Computer and Information Science (pp. 461-471). Berlin: Springer. https://doi.org/10.1007/978-3-642-04962-0_53
  • Hosmer Jr, D. W., Lemeshow, S. & Sturdivant, R. X. (2013). Applied Logistic Regression. New York: John Wiley and Sons.
  • Hossin, M. & Sulaiman, M. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1-11.
  • Huang, C., Yang, Y., Yang, D. & Chen, Y. (2009). Frog Classification Using Machine Learning Techniques. Expert Systems with Applications, 36(2), 3737-3743.
  • Jeni, L. A., Cohn, J. F. & Torre, F. D. (2013). Facing Imbalanced Data--Recommendations for the Use of Performance Metrics. Humaine Association Affective Computing and Intelligent Interaction Conference 2013, 245-251.
  • Jeong, B., Cho, H., Kim, J., Kwon, S., Hong, S., Lee, C. & Heo, T. (2020). Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data. Diagnostics, 10(6), 415.
  • Kumar, A., Sushil, R. & Tiwari, A. K. (2019). Significance of Accuracy Levels in Cancer Prediction using Machine Learning Techniques. Bioscience Biotechnology Research Communications, 12(3), 741-747.
  • Landis, J. R. & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), 159-174.
  • Luque, A., Carrasco, A., Martín, A. & Heras, A. (2019). The Impact of Class Imbalance in Classification Performance Metrics Based on the Binary Confusion Matrix. Pattern Recognition, 91, 216–231.
  • McHugh, M. L. (2012). Interrater Reliability: The Kappa Statistic. Biochemia Medica, 22(3), 276–282.
  • Metz, C. E. (1978). Basic Principles of ROC Analysis (PDF). Seminars in Nuclear Medicine, 8(4), 283–298. doi:10.1016/s0001-2998(78)80014-2.
  • Mingxing, G. (2021). A Novel Performance Measure for Machine Learning Classification. International Journal of Managing Information Technology, 13(1), 1-19.
  • Patel, A. C. & Markey, M. K. (2005). Comparison of Three-Class Classification Performance Metrics: A Case Study in Breast Cancer CAD. Medical Imaging 2005: Image Perception, Observer Performance, and Technology Assessment 2005. https://doi.org/10.1117/12.595763
  • Pereira, L. & Nunes, N. (2017). A Comparison of Performance Metrics for Event Classification in Non-Intrusive Load Monitoring. IEEE International Conference on Smart Grid Communications 2017, 159-164.
  • Powers, D. (2011). Evaluation: From Precision, Recall and F-Measure to Roc, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
  • Rácz, A., Bajusz, D. & Héberger, K. (2019). Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. Molecules, 24(15), 1-18.
  • Stehman, S. V. (1997). Selecting and Interpreting Measures of Thematic Classification Accuracy. Remote Sensing of Environment, 62(1), 77–89.

Comparison of Some Performance Metrics Used in Multiple Classification Problems

Yıl 2025, Cilt: 27 Sayı: 1, 22 - 39, 30.06.2025

Öz

The purpose of this research is to compare the performance metrics used in multiple classification problems in machine learning. For this purpose, simulation study was carried out under different scenarios by using 4 different classification methods and the performance metrics obtained were compared in this direction. While comparing the performance metrics in the study, the data to be used for classification purposes were derived under different scenarios, taking into account the effect of 4 factors. 90 different scenarios were created by considering the number of 3 different categories of the response variable, 5 different sample sizes, 3 different correlation structures, and the balanced and unbalanced distribution of the response variable. Accuray, Kappa and CramerV metrics used in multiple classification problems were used as performance measures. Changes in performance metrics in the determined scenarios are summarized in tables and compared. As a result of the comparisons made with the simulation study, it has been seen that Kappa performance measure is a more accurate performance metric than the other two metrics in multi-class classification problems, and the method gives more reliable information about the classification success.

Kaynakça

  • Ballabio, D., Grisoni, F. & Todeschini, R. (2018). Multivariate Comparison of Classification Performance Measures. Chemometrics and Intelligent Laboratory Systems, 174, 33-44. https://doi.org/10.1016/j.chemolab.2017.12.004.
  • Bishop, C. M. (2007). Pattern Recognition and Machine Learning. New York: Springer. ISBN: 0-387- 31073-8.
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
  • Breiman, L. F., Jerome, O. A., Richard, S. J. & Stone, C. (1993). Classification and Regression Trees. New York: Chapman & Hall.
  • Bridge, D. (2013). Classification: K-nearest Neighbours. Online Courses. Retrieved from www.cs.ucc.ie/~dgb/ courses/tai/notes/handout4.pdf, Accessed time: 12.08.2023.
  • Chen, P., Lien, C., Wu, W., Lee, L. & Shaw, J. (2020). Gait-Based Machine Learning for Classifying Patients with Different Types of Mild Cognitive Impairment. Journal of Medical Systems, 44(6),107-120.
  • De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J. & Moguerza, J. M. (2022). General Performance Score for Classification Problems. Applied Intelligence, 52(10), 12049-12063.
  • Dhasaradhan, K. & Jaichandran, R. (2022). Performance Analysis of Machine Learning Algorithms in Heart Disease Prediction. Concurrent Engineering, 30(4), 335-343.
  • Fatourechi, M., Ward, R. K., Mason, S. G., Huggins, J., Schlögl A. & Birch, G. E. (2008).Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets. Seventh International Conference on Machine Learning and Applications 2008, 777-782.
  • Fávero, L.P., Belfiore, P. & Souza, R.F. (2023). Bivariate Descriptive Statistics. In: L. P. Fávero, P. Belfiore & R. F. Souza (Eds.), Data Science, Analytics and Machine Learning with R (pp. 63-71). Academic Press. https://doi.org/10.1016/B978-0-12-824271-1.00003-2.
  • Ferri, C., Hernández-Orallo, J. & Modroiu, R. (2009). An Experimental Comparison of Performance Measures for Classification. Pattern Recognition Letters, 30(1), 27–38.
  • Folorunso, S. O., Awotunde, J. B., Adeniyi, E. A., Abiodun, K. M. & Ayo, F. E. (2022). Heart Disease Classification Using Machine Learning Models. In: S. Misra, J. Oluranti, R. Damaševičius & R. Maskeliunas (Eds.), Communications in Computer and Information Science (pp. 35-49). Springer, Cham. https://doi.org/10.1007/978-3-030-95630-1_3
  • Gösgens, M., Zhiyanov, A., Tikhonov, A. & Prokhorenkova, L. (2021). Good Classification Measures and How to Find Them. 35th Conference on Neural Information Processing Systems 2021, 1-12.
  • Grandini, M., Bagli, E. & Visani, G. (2020) Metrics for Multi-Class Classification: An Overview. arXiv 2020, (1-17). https://doi.org/10.48550/arXiv.2008.05756.
  • Gu, Q., Zhu, L. & Cai, Z. (2009). Evaluation Measures of the Classification Performance of Imbalanced Data Sets. In: Z. Cai, Z. Li, Z. Kang & Y. Liu (Eds.), Communications in Computer and Information Science (pp. 461-471). Berlin: Springer. https://doi.org/10.1007/978-3-642-04962-0_53
  • Hosmer Jr, D. W., Lemeshow, S. & Sturdivant, R. X. (2013). Applied Logistic Regression. New York: John Wiley and Sons.
  • Hossin, M. & Sulaiman, M. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1-11.
  • Huang, C., Yang, Y., Yang, D. & Chen, Y. (2009). Frog Classification Using Machine Learning Techniques. Expert Systems with Applications, 36(2), 3737-3743.
  • Jeni, L. A., Cohn, J. F. & Torre, F. D. (2013). Facing Imbalanced Data--Recommendations for the Use of Performance Metrics. Humaine Association Affective Computing and Intelligent Interaction Conference 2013, 245-251.
  • Jeong, B., Cho, H., Kim, J., Kwon, S., Hong, S., Lee, C. & Heo, T. (2020). Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data. Diagnostics, 10(6), 415.
  • Kumar, A., Sushil, R. & Tiwari, A. K. (2019). Significance of Accuracy Levels in Cancer Prediction using Machine Learning Techniques. Bioscience Biotechnology Research Communications, 12(3), 741-747.
  • Landis, J. R. & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), 159-174.
  • Luque, A., Carrasco, A., Martín, A. & Heras, A. (2019). The Impact of Class Imbalance in Classification Performance Metrics Based on the Binary Confusion Matrix. Pattern Recognition, 91, 216–231.
  • McHugh, M. L. (2012). Interrater Reliability: The Kappa Statistic. Biochemia Medica, 22(3), 276–282.
  • Metz, C. E. (1978). Basic Principles of ROC Analysis (PDF). Seminars in Nuclear Medicine, 8(4), 283–298. doi:10.1016/s0001-2998(78)80014-2.
  • Mingxing, G. (2021). A Novel Performance Measure for Machine Learning Classification. International Journal of Managing Information Technology, 13(1), 1-19.
  • Patel, A. C. & Markey, M. K. (2005). Comparison of Three-Class Classification Performance Metrics: A Case Study in Breast Cancer CAD. Medical Imaging 2005: Image Perception, Observer Performance, and Technology Assessment 2005. https://doi.org/10.1117/12.595763
  • Pereira, L. & Nunes, N. (2017). A Comparison of Performance Metrics for Event Classification in Non-Intrusive Load Monitoring. IEEE International Conference on Smart Grid Communications 2017, 159-164.
  • Powers, D. (2011). Evaluation: From Precision, Recall and F-Measure to Roc, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
  • Rácz, A., Bajusz, D. & Héberger, K. (2019). Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. Molecules, 24(15), 1-18.
  • Stehman, S. V. (1997). Selecting and Interpreting Measures of Thematic Classification Accuracy. Remote Sensing of Environment, 62(1), 77–89.
Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Ekonometrik ve İstatistiksel Yöntemler
Bölüm Araştırma Makalesi
Yazarlar

Ali Vasfi Ağlarcı 0000-0002-9010-4537

Cengiz Bal 0000-0002-1553-2902

Yayımlanma Tarihi 30 Haziran 2025
Gönderilme Tarihi 5 Ekim 2024
Kabul Tarihi 14 Mart 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 27 Sayı: 1

Kaynak Göster

APA Ağlarcı, A. V., & Bal, C. (2025). Comparison of Some Performance Metrics Used in Multiple Classification Problems. Kastamonu Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 27(1), 22-39. https://doi.org/10.21180/iibfdkastamonu.1561910