Research Article

Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests

Volume: 9 Number: 1 December 24, 2025
EN TR

Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests

Abstract

This paper presents the design and evaluation of an advanced multi-label classification system that predicts multiple diseases from symptom-based input data. Utilizing a dataset comprising 92 symptoms and 282 potential diseases from 653 patient records provided by home healthcare clinics, we applied a Random Forest Classifier within a multi-output framework. To ensure robustness, we compared the Random Forest model with other machine learning algorithms, including Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Gradient Boosting. The Random Forest model achieved the highest accuracy score of 98%, while other models also demonstrated competitive performance. The results indicate that the proposed model can serve as a reliable support tool in clinical environments, assisting in early diagnosis and enhancing the overall quality of care. Our approach distinguishes itself by effectively tackling the inherent multi-label complexity of medical diagnosis with exceptional accuracy.

Keywords

Ethical Statement

Ethics committee approval was not required for this study because there was no study on animals or humans.

Thanks

As authors, we thank Md. Mohammed Ahmed for providing the disease diagnosis dataset.

References

  1. Aslan, Ö., & Yılmaz, A. A. (2021). A new malware classification framework based on deep learning algorithms. IEEE Access, 9, 87936–87951. https://doi.org/10.1109/ACCESS.2021.3089586
  2. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  3. Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection from libraries of models. Proceedings of the 21st International Conference on Machine Learning, 161–168. https://doi.org/10.1145/1015330.1015432
  4. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
  5. Chouhan, V., Singh, S. K., Khamparia, A., Gupta, D., Tiwari, P., Moreira, C., & Gandomi, A. H. (2020). A novel transfer learning-based approach for pneumonia detection in chest X-ray images. Applied Sciences, 10(2), 559. https://doi.org/10.3390/app10020559
  6. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25(1), 24–29. https://doi.org/10.1038/s41591-018-0316-z
  7. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
  8. Joly, A., Geurts, P., & Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In T. Calders, F. Esposito, E. Hüllermeier, & R. Meo (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 8724, pp. 607–622). Springer. https://doi.org/10.1007/978-3-662-44848-9_39

Details

Primary Language

English

Subjects

Information Systems (Other)

Journal Section

Research Article

Early Pub Date

December 24, 2025

Publication Date

December 24, 2025

Submission Date

June 30, 2025

Acceptance Date

December 22, 2025

Published in Issue

Year 2026 Volume: 9 Number: 1

APA
Sevinç, Ö., & Yılmaz, A. A. (2026). Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. Black Sea Journal of Engineering and Science, 9(1), 295-304. https://doi.org/10.34248/bsengineering.1728860
AMA
1.Sevinç Ö, Yılmaz AA. Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. BSJ Eng. Sci. 2026;9(1):295-304. doi:10.34248/bsengineering.1728860
Chicago
Sevinç, Ömer, and Abdullah Asım Yılmaz. 2026. “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”. Black Sea Journal of Engineering and Science 9 (1): 295-304. https://doi.org/10.34248/bsengineering.1728860.
EndNote
Sevinç Ö, Yılmaz AA (January 1, 2026) Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. Black Sea Journal of Engineering and Science 9 1 295–304.
IEEE
[1]Ö. Sevinç and A. A. Yılmaz, “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”, BSJ Eng. Sci., vol. 9, no. 1, pp. 295–304, Jan. 2026, doi: 10.34248/bsengineering.1728860.
ISNAD
Sevinç, Ömer - Yılmaz, Abdullah Asım. “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”. Black Sea Journal of Engineering and Science 9/1 (January 1, 2026): 295-304. https://doi.org/10.34248/bsengineering.1728860.
JAMA
1.Sevinç Ö, Yılmaz AA. Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. BSJ Eng. Sci. 2026;9:295–304.
MLA
Sevinç, Ömer, and Abdullah Asım Yılmaz. “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”. Black Sea Journal of Engineering and Science, vol. 9, no. 1, Jan. 2026, pp. 295-04, doi:10.34248/bsengineering.1728860.
Vancouver
1.Ömer Sevinç, Abdullah Asım Yılmaz. Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. BSJ Eng. Sci. 2026 Jan. 1;9(1):295-304. doi:10.34248/bsengineering.1728860

                            24890