Research Article

Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models

Volume: 5 Number: 2 June 30, 2025
EN

Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models

Abstract

ICD standardizes diagnosis codes globally, aiding payments, research, planning, and quality management. Its complexity leads to longer exams, higher training costs, increased workforce needs, coding errors, and unreliable data. Automated ICD systems using ML address these issues. Long medical notes complicate ML, making feature extraction crucial for efficient ICD classification. Despite numerous studies, no systematic analysis of feature extraction methods, especially in deep learning (DL), exists. The MIMIC-III dataset is used with two preprocessing combinations, fundamental and advanced. TF-IDF, word2vec, GloVe, fastText, and BERT feature extraction methods are compared using DL models such as NN, CNN, and BiLSTM. For word2vec and fastText, CBOW and skip-gram architectures are compared. ROC-AUC, F1-score, precision, and recall metrics are calculated for DL performances. Advanced preprocessing improves performance for all feature extraction and DL methods. The best results for advanced preprocessing are micro ROC-AUC of 91.74\% (BiLSTM+fastText (skip-gram)), macro ROC-AUC of 88.58\% (BiLSTM+word2vec (CBOW)), micro F1/precision of 64.84\%/62.34\% (BiLSTM+word2vec (CBOW)), micro recall of 68.16\% (BiLSTM+fastText (skip-gram)), macro F1/precision of 59.67\%/57.71\% (BiLSTM+word2vec (CBOW)), and macro recall of 63.38\% (BiLSTM+fastText (skip-gram)). FastText is the most successful feature extraction method in DL models with fundamental preprocessing. However, models using well-implemented preprocessing highlight other feature extraction methods that perform better and operate more quickly. As DL model performance improves, differences between feature extraction performances diminish. Though not focused on the best results, CNN and BiLSTM with word2vec, GloVe, and fastText are competitive with current studies. Lastly, if computing power is limited, CNN may be preferable over BiLSTM with these feature extraction methods.

Keywords

deep learning (DL), natural language processing (NLP), feature extraction, international classification of diseases (ICD), MIMIC-III medical notes

Project Number

This work was supported by The Scientific and Technological Research Council of Türkiye (TUBITAK) - International Postdoctoral Research Fellowship Program (2219) of 2023. [grant number 1059B192302269].

References

  1. [1] Yan, C., Fu, X., Liu, X., Zhang, Y., Gao, Y., Wu, J. and Li, Q. A survey of automated International Classification of Diseases coding: development, challenges, and applications. Intelligent Medicine, 2(03), 161-173, (2022).
  2. [2] Niu, K., Wu, Y., Li, Y. and Li, M. Retrieve and rerank for automated ICD coding via contrastive learning. Journal of Biomedical Informatics, 143, 104396, (2023).
  3. [3] Kang, B., Wang, X., Xiong, Y., Zhang, Y., Zhou, C., Zhu, Y. et al. Automatic ICD coding based on segmented ClinicalBERT with hierarchical tree structure learning. In Proceedings, Database Systems for Advanced Applications (DASFAA), pp. 250-265, Tianjin, China, (2023, April).
  4. [4] Ayden, M.A., Yuksel, M.E. and Yuksel Erdem, S.E. A two-stream deep model for automated ICD-9 code prediction in an intensive care unit. Heliyon, 10(4), e25960, (2024).
  5. [5] PhysioNet, MIMIC-III Clinical Database, (2016). https://physionet.org/content/ mimiciii/1.4/
  6. [6] Larkey, L.S. and Croft, W.B. Combining classifiers in text categorization. In Proceedings, 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 289-297, Zurich, Switzerland, (1996, August).
  7. [7] Suominen, H., Ginter, F., Pyysalo, S., Airola, A., Pahikkala, T., Salanter, S. and Salakoski, T. Machine learning to automate the assignment of diagnosis codes to free-text radiology reports: a method description. In Proceedings, ICML/UAI/COLT Workshop on Machine Learning for Health-care Applications, pp. 1-8, Helsinki, Finland, (2008, July).
  8. [8] Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F. and Elhadad, N. Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 21(2), 231-237, (2014).
  9. [9] Marafino, B.J., Davies, J.M., Bardach, N.S., Dean, M.L. and Dudley, R.A. N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit. Journal of the American Medical Informatics Association, 21(5), 871-875, (2014).
  10. [10] Scheurwegs, E., Luyckx, K., Luyten, L., Daelemans, W. and Van den Bulcke, T. Data integration of structured and unstructured sources for assigning clinical codes to patient stays. Journal of the American Medical Informatics Association, 23(e1), e11-e19, (2016).
APA
Aydoğan Kılıç, D., Kılıç, D. K., & Nielsen, I. E. (2025). Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. Mathematical Modelling and Numerical Simulation With Applications, 5(2), 421-450. https://doi.org/10.53391/mmnsa.1666223
AMA
1.Aydoğan Kılıç D, Kılıç DK, Nielsen IE. Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. MMNSA. 2025;5(2):421-450. doi:10.53391/mmnsa.1666223
Chicago
Aydoğan Kılıç, Dilek, Deniz Kenan Kılıç, and Izabela Ewa Nielsen. 2025. “Comparative Study of Feature Extraction Methods for Automated ICD Code Classification Using MIMIC-III Medical Notes and Deep Learning Models”. Mathematical Modelling and Numerical Simulation With Applications 5 (2): 421-50. https://doi.org/10.53391/mmnsa.1666223.
EndNote
Aydoğan Kılıç D, Kılıç DK, Nielsen IE (June 1, 2025) Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. Mathematical Modelling and Numerical Simulation with Applications 5 2 421–450.
IEEE
[1]D. Aydoğan Kılıç, D. K. Kılıç, and I. E. Nielsen, “Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models”, MMNSA, vol. 5, no. 2, pp. 421–450, June 2025, doi: 10.53391/mmnsa.1666223.
ISNAD
Aydoğan Kılıç, Dilek - Kılıç, Deniz Kenan - Nielsen, Izabela Ewa. “Comparative Study of Feature Extraction Methods for Automated ICD Code Classification Using MIMIC-III Medical Notes and Deep Learning Models”. Mathematical Modelling and Numerical Simulation with Applications 5/2 (June 1, 2025): 421-450. https://doi.org/10.53391/mmnsa.1666223.
JAMA
1.Aydoğan Kılıç D, Kılıç DK, Nielsen IE. Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. MMNSA. 2025;5:421–450.
MLA
Aydoğan Kılıç, Dilek, et al. “Comparative Study of Feature Extraction Methods for Automated ICD Code Classification Using MIMIC-III Medical Notes and Deep Learning Models”. Mathematical Modelling and Numerical Simulation With Applications, vol. 5, no. 2, June 2025, pp. 421-50, doi:10.53391/mmnsa.1666223.
Vancouver
1.Dilek Aydoğan Kılıç, Deniz Kenan Kılıç, Izabela Ewa Nielsen. Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. MMNSA. 2025 Jun. 1;5(2):421-50. doi:10.53391/mmnsa.1666223