EN
TR
Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests
Öz
This paper presents the design and evaluation of an advanced multi-label classification system that predicts multiple diseases from symptom-based input data. Utilizing a dataset comprising 92 symptoms and 282 potential diseases from 653 patient records provided by home healthcare clinics, we applied a Random Forest Classifier within a multi-output framework. To ensure robustness, we compared the Random Forest model with other machine learning algorithms, including Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Gradient Boosting. The Random Forest model achieved the highest accuracy score of 98%, while other models also demonstrated competitive performance. The results indicate that the proposed model can serve as a reliable support tool in clinical environments, assisting in early diagnosis and enhancing the overall quality of care. Our approach distinguishes itself by effectively tackling the inherent multi-label complexity of medical diagnosis with exceptional accuracy.
Anahtar Kelimeler
Etik Beyan
Ethics committee approval was not required for this study because there was no study on animals or humans.
Teşekkür
As authors, we thank Md. Mohammed Ahmed for providing the disease diagnosis dataset.
Kaynakça
- Aslan, Ö., & Yılmaz, A. A. (2021). A new malware classification framework based on deep learning algorithms. IEEE Access, 9, 87936–87951. https://doi.org/10.1109/ACCESS.2021.3089586
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
- Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection from libraries of models. Proceedings of the 21st International Conference on Machine Learning, 161–168. https://doi.org/10.1145/1015330.1015432
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
- Chouhan, V., Singh, S. K., Khamparia, A., Gupta, D., Tiwari, P., Moreira, C., & Gandomi, A. H. (2020). A novel transfer learning-based approach for pneumonia detection in chest X-ray images. Applied Sciences, 10(2), 559. https://doi.org/10.3390/app10020559
- Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25(1), 24–29. https://doi.org/10.1038/s41591-018-0316-z
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
- Joly, A., Geurts, P., & Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In T. Calders, F. Esposito, E. Hüllermeier, & R. Meo (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 8724, pp. 607–622). Springer. https://doi.org/10.1007/978-3-662-44848-9_39
Ayrıntılar
Birincil Dil
İngilizce
Konular
Bilgi Sistemleri (Diğer)
Bölüm
Araştırma Makalesi
Erken Görünüm Tarihi
24 Aralık 2025
Yayımlanma Tarihi
24 Aralık 2025
Gönderilme Tarihi
30 Haziran 2025
Kabul Tarihi
22 Aralık 2025
Yayımlandığı Sayı
Yıl 2026 Cilt: 9 Sayı: 1
APA
Sevinç, Ö., & Yılmaz, A. A. (2026). Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. Black Sea Journal of Engineering and Science, 9(1), 295-304. https://doi.org/10.34248/bsengineering.1728860
AMA
1.Sevinç Ö, Yılmaz AA. Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. BSJ Eng. Sci. 2026;9(1):295-304. doi:10.34248/bsengineering.1728860
Chicago
Sevinç, Ömer, ve Abdullah Asım Yılmaz. 2026. “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”. Black Sea Journal of Engineering and Science 9 (1): 295-304. https://doi.org/10.34248/bsengineering.1728860.
EndNote
Sevinç Ö, Yılmaz AA (01 Ocak 2026) Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. Black Sea Journal of Engineering and Science 9 1 295–304.
IEEE
[1]Ö. Sevinç ve A. A. Yılmaz, “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”, BSJ Eng. Sci., c. 9, sy 1, ss. 295–304, Oca. 2026, doi: 10.34248/bsengineering.1728860.
ISNAD
Sevinç, Ömer - Yılmaz, Abdullah Asım. “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”. Black Sea Journal of Engineering and Science 9/1 (01 Ocak 2026): 295-304. https://doi.org/10.34248/bsengineering.1728860.
JAMA
1.Sevinç Ö, Yılmaz AA. Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. BSJ Eng. Sci. 2026;9:295–304.
MLA
Sevinç, Ömer, ve Abdullah Asım Yılmaz. “Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests”. Black Sea Journal of Engineering and Science, c. 9, sy 1, Ocak 2026, ss. 295-04, doi:10.34248/bsengineering.1728860.
Vancouver
1.Ömer Sevinç, Abdullah Asım Yılmaz. Advanced Multi-Label Classification for Predicting Diverse Diseases from Comprehensive Symptom Data Using Random Forests. BSJ Eng. Sci. 01 Ocak 2026;9(1):295-304. doi:10.34248/bsengineering.1728860