Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data

Muhammad Nazeer Musa; Philip Odion; Martins E. Irhebhude

doi:10.35377/saucis...1788178

EN

Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data

Abstract

Machine learning classifiers deployed in high-stakes domains like healthcare and finance face the dual challenges of class imbalance and asymmetric misclassification costs, which are poorly addressed by traditional evaluation metrics. The primary purpose of this study is to address this critical gap by developing and validating the Cost-Weighted Harmonic (CWH) score, a novel, bounded performance metric that unifies precision, recall, and specificity within a normalized harmonic mean, explicitly weighted by a user-defined cost ratio, for high-stakes imbalanced classification. Unlike cost-agnostic metrics (e.g., F1, HMRS) or unbounded cost-aware scores (e.g., C-score), CWH is interpretable, stable, and aligns evaluation with domain-specific risk priorities. It is integrated with threshold optimization and validated across healthcare, cybersecurity, and financial datasets, demonstrating superior stability and up to 69% performance improvement against C-score in life-critical scenarios without excessive false positives. CWH effectively bridges the gap between statistical evaluation and operational decision-making, offering practitioners a reliable tool for model selection that aligns with domain-specific risk priorities.

Keywords

References

A. Dwarampudi and M. K. Yogi, “A robust machine learning model for cyber incident classification and prioritization,” J. Trends Comput. Sci. Smart Technol., vol. 6, no. 1, pp. 51–66, Mar. 2024. doi: 10.36548/jtcsst.2024.1.004
S. Mohsen, “Alzheimer’s disease detection using deep learning and machine learning: A review,” Artif. Intell. Rev., vol. 58, Art. no. 250, Sep. 2025. doi: 10.1007/s10462-025-11258-y
R. O. Ogundokun, S. Misra, O. E. Ogundokun, J. B. Oluranti, and R. Maskeliunas, “Machine learning classification-based techniques for fraud discovery in credit card datasets,” in Applied Informatics, Cham, Switzerland: Springer, 2021, pp. 26–38. doi: 10.1007/978-3-030-89654-6_3
M. Marwah, A. Narayanan, S. Jou, M. Arlitt, and M. Pospelova, “Is 𝐹1 Score Suboptimal for Cybersecurity Models? Introducing 𝐶𝑠𝑐𝑜𝑟𝑒, a Cost-Aware Alternative for Model Assessment,” in CEUR Workshop Proceedings, Arlington, VA, Oct. 2024, pp. 190–209.
S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci. Rep., vol. 12, Art. no. 5979, Apr. 2022. doi: 10.1038/s41598-022-09954-8
H. Chen, N. Wang, X. Du, K. Mei, Y. Zhou, and G. Cai, “Classification prediction of breast cancer based on machine learning,” Comput. Intell. Neurosci., vol. 2023, Art. no. 6530719, Jan. 2023. doi: 10.1155/2023/6530719
M. M. Islam, M. R. Haque, H. Iqbal, M. M. Hasan, M. Hasan, and M. N. Kabir, “Breast cancer prediction: A comparative study using machine learning techniques,” SN Comput. Sci., vol. 1, Art. no. 290, Sep. 2020. doi: 10.1007/s42979-020-00305-w
A. Aslam and A. Hussain, “A performance analysis of machine learning techniques for credit card fraud detection,” J. Artif. Intell., vol. 6, 2024. doi: 10.32604/jai.2024.047226

M. Marwah, A. Narayanan, S. Jou, M. Arlitt, and M. Pospelova, “Is F1 score suboptimal for cybersecurity models? Introducing , Cscore a cost-aware alternative for model assessment,” arXiv preprint arXiv:2407.14664, Jul. 2024.
R. Burduk, “Recall and selectivity normalized in class labels as a classification performance metric,” in Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), Dec. 2023, pp. 894–902. doi: 10.1109/ICDMW60847.2023.00120
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009. doi: 10.1109/TKDE.2008.239
M. Amer, “Accuracy, Precision, Recall, and F1 Visually Explained,” Classification Evaluation Metrics, Cohere, 2023. [Online]. Available: https://cohere.com/blog/classification-eval-metrics. Accessed: May 8, 2026.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, Art. no. 6, Jan. 2020. doi: 10.1186/s12864-019-6413-7
H. Guo, Y. Li, J. Shang, M. Gu, Y. Huang, and B. Gong, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, May 2017. doi: 10.1016/j.eswa.2016.12.035
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002. doi: 10.1613/jair.953
C. Elkan, “The foundations of cost-sensitive learning,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), 2001, pp. 973–978.
V. S. Sheng and C. X. Ling, “Thresholding for making classifiers cost-sensitive,” in Proc. AAAI Conf. Artif. Intell., 2006, pp. 476–481.
C. Esposito, G. A. Landrum, N. Schneider, N. Stiefl, and S. Riniker, “GHOST: Adjusting the decision threshold to handle imbalanced data in machine learning,” J. Chem. Inf. Model., vol. 61, no. 6, pp. 2623–2640, Jun. 2021. doi: 10.1021/acs.jcim.1c00160
J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, Art. no. 27, Mar. 2019. doi: 10.1186/s40537-019-0192-5
M. Richter-Laskowska, M. Kurpas, and M. M. Maśka, “Learning by confusion approach to identification of discontinuous phase transitions,” Phys. Rev. E, vol. 108, no. 2, Art. no. 024113, Aug. 2023. doi: 10.1103/PhysRevE.108.024113
R. Batuwita and V. Palade, “Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning,” J. Bioinform. Comput. Biol., vol. 10, no. 4, Art. no. 1250003, Aug. 2012. doi: 10.1142/S0219720012500035
N. Japkowicz, “Assessment metrics for imbalanced learning,” in Imbalanced learning: Foundations, algorithms, and applications, H. He and Y. Ma, Eds. Hoboken, NJ, USA: Wiley, 2013, pp. 187–206. doi: 10.1002/9781118646106.ch8
D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, Art. no. 13, Feb. 2021. doi: 10.1186/s13040-021-00244-z
M. Du et al., “A skew-sensitive evaluation framework for imbalanced data classification,” arXiv preprint arXiv:2010.05995, Oct. 2020.
A. Gupta, N. Tatbul, R. Marcus, S. Zhou, I. Lee, and J. Gottschlich, “Class-weighted evaluation metrics for imbalanced data classification,” in Proc. ICML Workshop Data Mining Learn. Represent. (DMLR), Jul. 2023.
A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Syst. Appl., vol. 41, no. 10, pp. 4915–4928, Aug. 2014. doi: 10.1016/j.eswa.2014.02.026
V. López, A. Fernández, S. García, V. Palade, and F. Herrera, “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics,” Inf. Sci., vol. 250, pp. 113–141, Nov. 2013. doi: 10.1016/j.ins.2013.07.007
F. D. Frumosu, A. R. Khan, H. Schiøler, M. Kulahci, M. Zaki, and P. Westermann-Rasmussen, “Cost-sensitive learning classification strategy for predicting product failures,” Expert Syst. Appl., vol. 161, Art. no. 113653, Dec. 2020. doi: 10.1016/j.eswa.2020.113653
A. K. Menon, H. Narasimhan, S. Agarwal, and S. Chawla, “On the statistical consistency of algorithms for binary classification under class imbalance,” in Proc. Int. Conf. Mach. Learn. (ICML), 2013, pp. 603–611.
C. Patel, Y. Wang, T. Ramaraj, R. Tchoua, J. Furst, and D. Raicu, “Optimizing computer-aided diagnosis with cost-aware deep learning models,” in Pacific Symposium on Biocomputing 2024, Singapore: World Scientific, 2023, pp. 108–119. doi: 10.1142/9789811286421_0009
Z. S. Hossein Abad, A. Kline, and J. Lee, “Evaluation of machine learning-based patient outcome prediction using patient-specific difficulty and discrimination indices,” in Proc. 42nd Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2020, pp. 5446–5449, doi: 10.1109/EMBC44109.2020.9176622.
A. A. Cárdenas and J. S. Baras, “Evaluation of classifiers: Practical considerations for security applications,” in Proc. AAAI Workshop Eval. Methods Mach. Learn., Jul. 2006.
T. Salman, A. Ghubaish, D. Unal, and R. Jain, “Safety score as an evaluation metric for machine learning models of security applications,” IEEE Netw. Lett., vol. 2, no. 4, pp. 207–211, Dec. 2020, doi: 10.1109/LNET.2020.3016583.
S. Daskalaki, I. Kopanas, and N. Avouris, “Evaluation of classifiers for an uneven class distribution problem,” Appl. Artif. Intell., vol. 20, no. 5, pp. 381–417, May 2006. doi: 10.1080/08839510500313653
N. I. George, T.-P. Lu, and C.-W. Chang, “Cost-sensitive performance metric for comparing multiple ordinal classifiers,” Artif. Intell. Res., vol. 5, no. 1, pp. 135–149, 2016.
A. Dey, “Machine learning algorithms: A review,” Int. J. Comput. Sci. Inf. Technol., vol. 7, no. 3, pp. 1174–1179, 2016. doi: 10.21275/sr22815163219
D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, and F. Leisch, “e1071: Misc functions of the Department of Statistics, Probability Theory Group,” R package version 1.6-7, 2015.
H. Z. Rui, T. Y. Chien, L. X. Ee, and L. T. Yi, “Comparison of the use of support vector machine (SVM) and random forest algorithms (RF) for DDoS attack detection,” Int. J. Res. Innov. Soc. Sci., vol. 9, no. 1, pp. 1126–1138, 2025.
I. Ahmad, M. Basheri, M. J. Iqbal, and A. Rahim, “Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection,” IEEE Access, vol. 6, pp. 33789–33795, 2018. doi: 10.1109/ACCESS.2018.2841987
W. Astuti and Adiwijaya, “Support vector machine and principal component analysis for microarray data classification,” in Journal of Physics: Conference Series, 2018, p. 012003. DOI 10.1088/1742-6596/971/1/012003
R. Vijayanand, D. Devaraj, and B. Kannapiran, “Intrusion detection system for wireless mesh network using multiple support vector machine classifiers with genetic-algorithm-based feature selection,” Comput. Secur., vol. 77, pp. 304–314, 2018. doi: 10.1016/j.cose.2018.04.010
L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001. doi: 10.1023/A:1010933404324
G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25, no. 2, pp. 197–227, 2016. doi: 10.1007/s11749-016-0481-7
P. Probst and A.-L. Boulesteix, “To tune or not to tune the number of trees in random forest,” J. Mach. Learn. Res., vol. 18, no. 181, pp. 1–18, 2018.
A. Abdulla, G. Baryannis, and I. Badi, “An integrated machine learning and MARCOS method for supplier evaluation and selection,” Decis. Anal. J., vol. 9, Oct. 2023. doi: 10.1016/j.dajour.2023.100342
S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019. doi: 10.1186/s12911-019-1004-8
C. Ferri, J. Hernández-Orallo, and R. Modroiu, “An experimental comparison of performance measures for classification,” Pattern Recognit. Lett., vol. 30, no. 1, pp. 27–38, 2009. doi: 10.1016/j.patrec.2008.08.010
J. Hernández-Orallo, P. Flach, and C. Ferri, “A unified view of performance metrics: Translating threshold choice into expected classification loss,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 2813–2869, 2012.
C. Weng and J. Poon, “A new evaluation measure for imbalanced datasets,” in Proc. 7th Australasian Data Mining Conf., Glenelg, SA, Australia, Nov. 2008, pp. 27–32.
P. Chapman et al., CRISP-DM 1.0: Step-by-step data mining guide. SPSS Inc., 2000.
M. H. Alshayeji, H. Ellethy, S. Abed, and R. Gupta, “Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach,” Biomed. Signal Process. Control, vol. 71, Art. no. 103141, 2022. doi: 10.1016/j.bspc.2021.103141
N. W. Street, W. H. Wolberg, and O. L. Mangasarian, “Nuclear feature extraction for breast tumor diagnosis,” in Biomedical image processing and biomedical visualization, SPIE, pp. 861–870. doi: 10.1016/j.bspc.2021.103141
M. Ghulam, “NSL-KDD,” IEEE Dataport, 2018. https://ieee-dataport.org/documents/nsl-kdd
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit card fraud detection: A realistic modeling and a novel learning strategy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3784–3797, 2017. doi: 10.1109/TNNLS.2017.2736643
V. Vapnik, The nature of statistical learning theory. New York, NY, USA: Springer, 2013.

Details

Primary Language

English

Subjects

Artificial Intelligence (Other)

Journal Section

Research Article

Authors

Muhammad Nazeer Musa ^*
0009-0007-9420-2066
Nigeria

Philip Odion
0009-0006-2194-1370
Nigeria

Martins E. Irhebhude
0000-0003-3075-6741
Nigeria

Early Pub Date

June 8, 2026

Publication Date

June 17, 2026

Submission Date

September 29, 2025

Acceptance Date

February 8, 2026

Published in Issue

Year 2026 Volume: 9 Number: 2

DOI

https://doi.org/10.35377/saucis...1788178

IZ

https://izlik.org/JA52JJ95NY

Cite

RIS / Bibtex

APA

Musa, M. N., Odion, P., & Irhebhude, M. E. (2026). Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data. Sakarya University Journal of Computer and Information Sciences, 9(2), 494-516. https://doi.org/10.35377/saucis...1788178

AMA

1.Musa MN, Odion P, Irhebhude ME. Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data. SAUCIS. 2026;9(2):494-516. doi:10.35377/saucis.1788178

Chicago

Musa, Muhammad Nazeer, Philip Odion, and Martins E. Irhebhude. 2026. “Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data”. Sakarya University Journal of Computer and Information Sciences 9 (2): 494-516. https://doi.org/10.35377/saucis. 1788178.

EndNote

Musa MN, Odion P, Irhebhude ME (June 1, 2026) Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data. Sakarya University Journal of Computer and Information Sciences 9 2 494–516.

IEEE

[1]M. N. Musa, P. Odion, and M. E. Irhebhude, “Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data”, SAUCIS, vol. 9, no. 2, pp. 494–516, June 2026, doi: 10.35377/saucis...1788178.

ISNAD

Musa, Muhammad Nazeer - Odion, Philip - Irhebhude, Martins E. “Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data”. Sakarya University Journal of Computer and Information Sciences 9/2 (June 1, 2026): 494-516. https://doi.org/10.35377/saucis. 1788178.

JAMA

1.Musa MN, Odion P, Irhebhude ME. Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data. SAUCIS. 2026;9:494–516.

MLA

Musa, Muhammad Nazeer, et al. “Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 2, June 2026, pp. 494-16, doi:10.35377/saucis. 1788178.

Vancouver

1.Muhammad Nazeer Musa, Philip Odion, Martins E. Irhebhude. Cost-Weighted Harmonic Score: A Unified Metric for Cost-Sensitive Classification on High-Stakes Imbalanced Data. SAUCIS. 2026 Jun. 1;9(2):494-516. doi:10.35377/saucis. 1788178