Research Article
BibTex RIS Cite

Using the Polygon Area Metric for Evaluation of Classifier Performance in the Classification of Unbalanced Datasets

Year 2022, , 194 - 205, 30.08.2022
https://doi.org/10.53433/yyufbed.1066340

Abstract

In recent years, machine learning methods have been used in many disciplines. Researchers test many methods together with many metrics to determine the most suitable classifier. Classical metrics (classification accuracy, specificity, sensitivity, area under the curve, Jaccard index and F metric) used to compare classifier performances, especially for irregular data sets, make traceability difficult with the formation of large tables. On the other hand, while a classifier provides high performance in terms of one metric, it may provide low performance in terms of another metric. All this complicates the determination of the most suitable classifier. In this study, it has been shown that polygon area metric (PAM) can be used to compare classifier performances, in irregular data sets. While this metric is calculated over the classification accuracy, specificity, sensitivity, area under the curve, Jaccard index and the area of the polygon formed by the F metric on a regular hexagon, it is based on visualizing the values of classical metrics on this regular hexagon. It has been concluded that this method can perform classifier performances effectively.

References

  • Al-Garadi, M. A., Hussain, M. R., Khan, N., Murtaza, G., Nweke, H. F., Ali, I., Mujtaba, G., Chiroma, H., Khattak, H. A., & Gani, A. (2019). Predicting cyberbullying on social media in the big data era using machine learning algorithms: Review of literature and open challenges. IEEE Access, 7, 70701-70718. doi: 10.1109/ACCESS.2019.2918354
  • Al-Salman, W., Li, Y., Wen, P., Miften, F. S., Oudah, A. Y., & Al Ghayab, H. R. (2022). Extracting epileptic features in EEGs using a dual-tree complex wavelet transform coupled with a classification algorithm. Brain Research, 147777. doi: 10.1016/j.brainres.2022.147777
  • Alsheikh, M. A., Lin, S., Niyato, D., & Tan, H. P. (2014). Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Communications Surveys & Tutorials, 16(4), 1996-2018. doi: 10.1109/COMST.2014.2320099
  • Aydemir, O. (2021). A new performance evaluation metric for classifiers: polygon area metric. Journal of Classification, 38(1), 16-26. doi: 10.1007/s00357-020-09362-5
  • Aydemir, O. (2017). Olfactory recognition based on EEG gamma-band activity. Neural Computation, 29(6), 1667-1680. doi: 10.1162/NECO_a_00966
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874. doi: 10.1016/j.patrec.2005.10.010
  • Huang, B., Zhu, Y., Wang, Z., & Fang, Z. (2021). Imbalanced data classification algorithm based on clustering and SVM. Journal of Circuits, Systems and Computers, 30(2), 2150036. doi: 10.1142/S0218126621500365
  • Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1. doi: 10.5121/ijdkp.2015.5201
  • Kroupi, E., Yazdani, A., Vesin, J. M., & Ebrahimi, T. (2014). EEG correlates of pleasant and unpleasant odor perception. ACM Transactions on Multimedia Computing, Communications, and Applications, 11(1), 1-17. doi: 10.1145/2637287
  • Liu, H., Li, J., Cao, H., Xie, X., & Wang, Y. (2022). Prediction modeling of geogenic iodine contaminated groundwater throughout China. Journal of Environmental Management, 303, 114249. doi: 10.1016/j.jenvman.2021.114249
  • Liu, Y., Zhou, Y., Wen, S., & Tang, C. (2014). A strategy on selecting performance metrics for classifier evaluation. International Journal of Mobile Computing and Multimedia Communications, 6(4), 20-35. doi: 10.4018/IJMCMC.2014100102
  • Mahami, A., Rahmoune, C., Bettahar, T., & Benazzouz, D. (2021). Induction motor condition monitoring using infrared thermography imaging and ensemble learning techniques. Advances in Mechanical Engineering, 13(11). doi: 10.1177/16878140211060956
  • MATLAB. (2018). MathWorks web sayfası: www.mathworks.com/matlabcentral/fileexchange/74136-polygon-area-metric-for-classifier-evaluation
  • Militello, C., Ranieri, A., Rundo, L., D’Angelo, I., Marinozzi, F., Bartolotta, T. V., ... & Russo, G. (2022). On unsupervised methods for medical image segmentation: investigating classic approaches in breast cancer DCE-MRI. Applied Sciences, 12(1), 162. doi:10.3390/app12010162
  • Mun, J., Jang, W. D., Sung, D. J., & Kim, C. S. (2017, September). Comparison of objective functions in CNN-based prostate magnetic resonance image segmentation. IEEE International Conference on Image Processing (ICIP), 3859-3863. doi: 10.1109/ICIP.2017.8297005
  • Rodríguez-de-Vera, J. M., Bernabé, G., García, J. M., Saura, D., & González-Carrillo, J. (2022). Left ventricular non-compaction cardiomyopathy automatic diagnosis using a deep learning approach. Computer Methods and Programs in Biomedicine, 214. doi:10.1016/j.cmpb.2021.106548
  • Roy, A., Singh, B. K., Banchhor, S. K., & Verma, K. (2022). Segmentation of malignant tumours in mammogram images: A hybrid approach using convolutional neural networks and connected component analysis. Expert Systems, 39(1). doi: 10.1111/exsy.12826
  • Sadiq, M. T., Yu, X., Yuan, Z., & Aziz, M. Z. (2020). Identification of motor and mental imagery EEG in two and multiclass subject-dependent tasks using successive decomposition index. Sensors, 20(18). doi: 10.3390/s20185283
  • Shia, W. C., & Chen, D. R. (2021). Classification of malignant tumors in breast ultrasound using a pretrained deep residual network model and support vector machine. Computerized Medical Imaging and Graphics, 87. doi: 10.1016/j.compmedimag.2020.101829
  • Vuttipittayamongkol, P., Elyan, E., & Petrovski, A. (2021). On the class overlap problem in imbalanced data classification. Knowledge-Based Systems, 212. doi: /10.1016/j.knosys.2020.106631
  • Yeung, H. W. F., Zhou, M., Chung, Y. Y., Moule, G., Thompson, W., Ouyang, W., Cai, W., & Bennamoun, M. (2022). Deep-learning-based solution for data deficient satellite image segmentation. Expert Systems with Applications, 191. doi: 10.1016/j.eswa.2021.116210
  • Yin, L., Lin, X., Liu, J., Li, N., He, X., Zhang, M., 2021. Investigation on Nutrition Status and Clinical Outcome of Common Cancers (INSCOC) Group. Classification tree–based machine learning to visualize and validate a decision tool for identifying malnutrition in cancer patients. Journal of Parenteral and Enteral Nutrition, 45(8), 1736-1748. doi: 10.1002/jpen.2070
  • Yuvaraj, N., Chang, V., Gobinathan, B., Pinagapani, A., Kannan, S., Dhiman, G., & Rajan, A. R. (2021). Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification. Computers & Electrical Engineering, 92, 107186. doi: 10.1016/j.compeleceng.2021.107186

Dengesiz Veri Kümelerinin Sınıflandırılmasında Poligon Alan Metriğinin Sınıflandırıcı Performans Değerlendirilmesi İçin Kullanılması

Year 2022, , 194 - 205, 30.08.2022
https://doi.org/10.53433/yyufbed.1066340

Abstract

Son yıllarda makine öğrenmesi yöntemleri birçok disiplinde kullanılmaktadır. Araştırmacılar en uygun sınıflandırıcıyı belirlemek için birçok yöntemi yine birçok metrik ile birlikte test etmektedir. Özellikle düzensiz veri setleri için sınıflandırıcı performanslarını karşılaştırmak için kullanılan klasik metrikler (sınıflandırma doğruluğu, özgüllük, duyarlılık, eğri altında kalan alan, Jaccard indeksi ve F metriği) büyük tabloların oluşması ile birlikte takip edilebilirliği zorlaştırmaktadır. Diğer taraftan ise bir sınıflandırıcı bir metrik açısından yüksek performans sağlarken başka bir metrik açısından düşük performans sağlayabilmektedir. Tüm bunlar en uygun sınıflandırıcının belirlenmesini zorlaştırmaktadır. Bu çalışmada düzensiz veri setlerinde sınıflandırıcı performanslarını karşılaştırabilmek için poligon alan metriğinin (PAM) kullanılabileceği gösterilmiştir. Bu metrik sınıflandırma doğruluğu, özgüllük, duyarlılık, eğri altında kalan alan, Jaccard indeksi ve F metriğinin düzgün bir altıgen üzerinde oluşturdukları poligonun alanı üzerinden hesaplanırken, klasik metriklerin değerlerini de bu düzgün altıgen üzerinde görselleştirme esasına dayanmaktadır. Bu yöntem ile sınıflandırıcı performanslarının etkin bir biçimde karşılaştırılabileceği sonucuna varılmıştır.

References

  • Al-Garadi, M. A., Hussain, M. R., Khan, N., Murtaza, G., Nweke, H. F., Ali, I., Mujtaba, G., Chiroma, H., Khattak, H. A., & Gani, A. (2019). Predicting cyberbullying on social media in the big data era using machine learning algorithms: Review of literature and open challenges. IEEE Access, 7, 70701-70718. doi: 10.1109/ACCESS.2019.2918354
  • Al-Salman, W., Li, Y., Wen, P., Miften, F. S., Oudah, A. Y., & Al Ghayab, H. R. (2022). Extracting epileptic features in EEGs using a dual-tree complex wavelet transform coupled with a classification algorithm. Brain Research, 147777. doi: 10.1016/j.brainres.2022.147777
  • Alsheikh, M. A., Lin, S., Niyato, D., & Tan, H. P. (2014). Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Communications Surveys & Tutorials, 16(4), 1996-2018. doi: 10.1109/COMST.2014.2320099
  • Aydemir, O. (2021). A new performance evaluation metric for classifiers: polygon area metric. Journal of Classification, 38(1), 16-26. doi: 10.1007/s00357-020-09362-5
  • Aydemir, O. (2017). Olfactory recognition based on EEG gamma-band activity. Neural Computation, 29(6), 1667-1680. doi: 10.1162/NECO_a_00966
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874. doi: 10.1016/j.patrec.2005.10.010
  • Huang, B., Zhu, Y., Wang, Z., & Fang, Z. (2021). Imbalanced data classification algorithm based on clustering and SVM. Journal of Circuits, Systems and Computers, 30(2), 2150036. doi: 10.1142/S0218126621500365
  • Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1. doi: 10.5121/ijdkp.2015.5201
  • Kroupi, E., Yazdani, A., Vesin, J. M., & Ebrahimi, T. (2014). EEG correlates of pleasant and unpleasant odor perception. ACM Transactions on Multimedia Computing, Communications, and Applications, 11(1), 1-17. doi: 10.1145/2637287
  • Liu, H., Li, J., Cao, H., Xie, X., & Wang, Y. (2022). Prediction modeling of geogenic iodine contaminated groundwater throughout China. Journal of Environmental Management, 303, 114249. doi: 10.1016/j.jenvman.2021.114249
  • Liu, Y., Zhou, Y., Wen, S., & Tang, C. (2014). A strategy on selecting performance metrics for classifier evaluation. International Journal of Mobile Computing and Multimedia Communications, 6(4), 20-35. doi: 10.4018/IJMCMC.2014100102
  • Mahami, A., Rahmoune, C., Bettahar, T., & Benazzouz, D. (2021). Induction motor condition monitoring using infrared thermography imaging and ensemble learning techniques. Advances in Mechanical Engineering, 13(11). doi: 10.1177/16878140211060956
  • MATLAB. (2018). MathWorks web sayfası: www.mathworks.com/matlabcentral/fileexchange/74136-polygon-area-metric-for-classifier-evaluation
  • Militello, C., Ranieri, A., Rundo, L., D’Angelo, I., Marinozzi, F., Bartolotta, T. V., ... & Russo, G. (2022). On unsupervised methods for medical image segmentation: investigating classic approaches in breast cancer DCE-MRI. Applied Sciences, 12(1), 162. doi:10.3390/app12010162
  • Mun, J., Jang, W. D., Sung, D. J., & Kim, C. S. (2017, September). Comparison of objective functions in CNN-based prostate magnetic resonance image segmentation. IEEE International Conference on Image Processing (ICIP), 3859-3863. doi: 10.1109/ICIP.2017.8297005
  • Rodríguez-de-Vera, J. M., Bernabé, G., García, J. M., Saura, D., & González-Carrillo, J. (2022). Left ventricular non-compaction cardiomyopathy automatic diagnosis using a deep learning approach. Computer Methods and Programs in Biomedicine, 214. doi:10.1016/j.cmpb.2021.106548
  • Roy, A., Singh, B. K., Banchhor, S. K., & Verma, K. (2022). Segmentation of malignant tumours in mammogram images: A hybrid approach using convolutional neural networks and connected component analysis. Expert Systems, 39(1). doi: 10.1111/exsy.12826
  • Sadiq, M. T., Yu, X., Yuan, Z., & Aziz, M. Z. (2020). Identification of motor and mental imagery EEG in two and multiclass subject-dependent tasks using successive decomposition index. Sensors, 20(18). doi: 10.3390/s20185283
  • Shia, W. C., & Chen, D. R. (2021). Classification of malignant tumors in breast ultrasound using a pretrained deep residual network model and support vector machine. Computerized Medical Imaging and Graphics, 87. doi: 10.1016/j.compmedimag.2020.101829
  • Vuttipittayamongkol, P., Elyan, E., & Petrovski, A. (2021). On the class overlap problem in imbalanced data classification. Knowledge-Based Systems, 212. doi: /10.1016/j.knosys.2020.106631
  • Yeung, H. W. F., Zhou, M., Chung, Y. Y., Moule, G., Thompson, W., Ouyang, W., Cai, W., & Bennamoun, M. (2022). Deep-learning-based solution for data deficient satellite image segmentation. Expert Systems with Applications, 191. doi: 10.1016/j.eswa.2021.116210
  • Yin, L., Lin, X., Liu, J., Li, N., He, X., Zhang, M., 2021. Investigation on Nutrition Status and Clinical Outcome of Common Cancers (INSCOC) Group. Classification tree–based machine learning to visualize and validate a decision tool for identifying malnutrition in cancer patients. Journal of Parenteral and Enteral Nutrition, 45(8), 1736-1748. doi: 10.1002/jpen.2070
  • Yuvaraj, N., Chang, V., Gobinathan, B., Pinagapani, A., Kannan, S., Dhiman, G., & Rajan, A. R. (2021). Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification. Computers & Electrical Engineering, 92, 107186. doi: 10.1016/j.compeleceng.2021.107186
There are 23 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Önder Aydemir 0000-0002-1177-8518

Publication Date August 30, 2022
Submission Date February 1, 2022
Published in Issue Year 2022

Cite

APA Aydemir, Ö. (2022). Dengesiz Veri Kümelerinin Sınıflandırılmasında Poligon Alan Metriğinin Sınıflandırıcı Performans Değerlendirilmesi İçin Kullanılması. Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 27(2), 194-205. https://doi.org/10.53433/yyufbed.1066340