Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models

Dilek Aydoğan Kılıç; Deniz Kenan Kılıç; Izabela Ewa Nielsen

doi:10.53391/mmnsa.1666223

EN

Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models

Abstract

ICD standardizes diagnosis codes globally, aiding payments, research, planning, and quality management. Its complexity leads to longer exams, higher training costs, increased workforce needs, coding errors, and unreliable data. Automated ICD systems using ML address these issues. Long medical notes complicate ML, making feature extraction crucial for efficient ICD classification. Despite numerous studies, no systematic analysis of feature extraction methods, especially in deep learning (DL), exists. The MIMIC-III dataset is used with two preprocessing combinations, fundamental and advanced. TF-IDF, word2vec, GloVe, fastText, and BERT feature extraction methods are compared using DL models such as NN, CNN, and BiLSTM. For word2vec and fastText, CBOW and skip-gram architectures are compared. ROC-AUC, F1-score, precision, and recall metrics are calculated for DL performances. Advanced preprocessing improves performance for all feature extraction and DL methods. The best results for advanced preprocessing are micro ROC-AUC of 91.74\% (BiLSTM+fastText (skip-gram)), macro ROC-AUC of 88.58\% (BiLSTM+word2vec (CBOW)), micro F1/precision of 64.84\%/62.34\% (BiLSTM+word2vec (CBOW)), micro recall of 68.16\% (BiLSTM+fastText (skip-gram)), macro F1/precision of 59.67\%/57.71\% (BiLSTM+word2vec (CBOW)), and macro recall of 63.38\% (BiLSTM+fastText (skip-gram)). FastText is the most successful feature extraction method in DL models with fundamental preprocessing. However, models using well-implemented preprocessing highlight other feature extraction methods that perform better and operate more quickly. As DL model performance improves, differences between feature extraction performances diminish. Though not focused on the best results, CNN and BiLSTM with word2vec, GloVe, and fastText are competitive with current studies. Lastly, if computing power is limited, CNN may be preferable over BiLSTM with these feature extraction methods.

Keywords

deep learning (DL), natural language processing (NLP), feature extraction, international classification of diseases (ICD), MIMIC-III medical notes

Project Number

This work was supported by The Scientific and Technological Research Council of Türkiye (TUBITAK) - International Postdoctoral Research Fellowship Program (2219) of 2023. [grant number 1059B192302269].

References

[1] Yan, C., Fu, X., Liu, X., Zhang, Y., Gao, Y., Wu, J. and Li, Q. A survey of automated International Classification of Diseases coding: development, challenges, and applications. Intelligent Medicine, 2(03), 161-173, (2022).
[2] Niu, K., Wu, Y., Li, Y. and Li, M. Retrieve and rerank for automated ICD coding via contrastive learning. Journal of Biomedical Informatics, 143, 104396, (2023).
[3] Kang, B., Wang, X., Xiong, Y., Zhang, Y., Zhou, C., Zhu, Y. et al. Automatic ICD coding based on segmented ClinicalBERT with hierarchical tree structure learning. In Proceedings, Database Systems for Advanced Applications (DASFAA), pp. 250-265, Tianjin, China, (2023, April).
[4] Ayden, M.A., Yuksel, M.E. and Yuksel Erdem, S.E. A two-stream deep model for automated ICD-9 code prediction in an intensive care unit. Heliyon, 10(4), e25960, (2024).
[5] PhysioNet, MIMIC-III Clinical Database, (2016). https://physionet.org/content/ mimiciii/1.4/
[6] Larkey, L.S. and Croft, W.B. Combining classifiers in text categorization. In Proceedings, 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 289-297, Zurich, Switzerland, (1996, August).
[7] Suominen, H., Ginter, F., Pyysalo, S., Airola, A., Pahikkala, T., Salanter, S. and Salakoski, T. Machine learning to automate the assignment of diagnosis codes to free-text radiology reports: a method description. In Proceedings, ICML/UAI/COLT Workshop on Machine Learning for Health-care Applications, pp. 1-8, Helsinki, Finland, (2008, July).
[8] Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F. and Elhadad, N. Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 21(2), 231-237, (2014).
[9] Marafino, B.J., Davies, J.M., Bardach, N.S., Dean, M.L. and Dudley, R.A. N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit. Journal of the American Medical Informatics Association, 21(5), 871-875, (2014).
[10] Scheurwegs, E., Luyckx, K., Luyten, L., Daelemans, W. and Van den Bulcke, T. Data integration of structured and unstructured sources for assigning clinical codes to patient stays. Journal of the American Medical Informatics Association, 23(e1), e11-e19, (2016).

[11] Crammer, K., Dredze, M., Ganchev, K., Talukdar, P. and Carroll, S. Automatic code assignment to medical text. In Proceedings, BioNLP: Biological, Translational, and Clinical Language Processing, pp. 129-136, Prague, Czechia, (2007, June).
[12] Kaur, R. and Ginige, J.A. Comparative analysis of algorithmic approaches for auto-coding with ICD-10-AM and ACHI. In Proceedings, 26th Australian National Health Informatics Conference (HIC), pp. 73-79, Netherlands, (2018).
[13] Huang, J., Osorio, C. and Sy, L.W. An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes. Computer Methods and Programs in Biomedicine, 177, 141-153, (2019).
[14] Kaur, R., Ginige, J.A. and Obst, O. AI-based ICD coding and classification approaches using discharge summaries: A systematic literature review. Expert Systems with Applications, 213(B), 118997, (2023).
[15] Shi, H., Xie, P., Hu, Z., Zhang, M. and Xing, E.P. Towards automated ICD coding using deep learning. ArXiv Preprint, ArXiv:1711.04075, (2017).
[16] Baumel, T., Nassour-Kassis, J., Cohen, R., Elhadad, M. and Elhadad, N. Multi-label classification of patient notes: Case study on ICD code assignment. In Proceedings, The Workshops of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 409-416, Louisiana, USA, (2018, February).
[17] Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J. and Eisenstein, J. Explainable prediction of medical codes from clinical text. ArXiv Preprint, ArXiv:1802.05695, (2018).
[18] Wu, Y., Zeng, M., Fei, Z., Yu, Y., Wu, F.X. and Li, M. KAICD: A knowledge attention-based deep learning framework for automatic ICD coding. Neurocomputing, 469, 376-383, (2022).
[19] Vu, T., Nguyen, D.Q. and Nguyen, A. A label attention model for ICD coding from clinical text. In Proceedings, Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3335-3341, Yokohama, Japan, (2021, January).
[20] Li, F. and Yu, H. ICD coding from clinical text using multi-filter residual convolutional neural network. In Proceedings, AAAI Conference on Artificial Intelligence, pp. 8180-8187, New York, USA, (2020, February).
[21] Cao, P., Yan, C., Fu, X., Chen, Y., Liu, K., Zhao, J. et al. Clinical-coder: Assigning interpretable ICD-10 codes to Chinese clinical notes. In Proceedings, 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 294-301, Online, (2020, July).
[22] Bhutto, S.R., Zeng, M., Niu, K., Khoso, S., Umar, M., Lalley, G. and Li, M. Automatic ICD10-CM coding via Lambda-Scaled attention based deep learning model. Methods, 222, 19-27, (2024).
[23] Rios, A. and Kavuluru, R. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings, 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3132–3142, Brussels, Belgium, (2018, October).
[24] Du, Y., Xu, T., Ma, J., Cen, E., Zheng, Y., Liu, T. and Tong, G. An automatic ICD coding method for clinical records based on deep neural network. Big Data Research, 6(5), 3-15, (2020).
[25] Zhao, S., Diao, X., Xia, Y., Huo, Y., Cui, M., Wang, Y. et al. Automated ICD coding for coronary heart diseases by a deep learning method. Heliyon, 9(3), e14037, (2023).
[26] Zhang, Z., Liu, J. and Razavian, N. BERT-XML: Large scale automated ICD coding using BERT pretraining. ArXiv Preprint, ArXiv:2006.03685, (2020).
[27] Pascual, D., Luck, S. and Wattenhofer, R. Towards BERT-based automatic ICD coding: Limitations and opportunities. ArXiv Preprint, ArXiv:2104.06709, (2021).
[28] Huang, C.W., Tsai, S.C. and Chen, Y.N. PLM-ICD: Automatic ICD coding with pretrained language models. ArXiv Preprint, ArXiv:2207.05289, (2022).
[29] Ponthongmak, W., Thammasudjarit, R., McKay, G.J., Attia, J., Theera-Ampornpunt, N. and Thakkinstian, A. Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches. Informatics in Medicine Unlocked, 38, 101227, (2023).
[30] Aydo˘gan Kılıç, D., Kılıç, D.K. and Nielsen, I.E. Examination of summarized medical records for ICD code classification Via BERT. Applied Computer Science, 20(2), 60-74, (2024).
[31] Wang, D., Su, J. and Yu, H. Feature extraction and analysis of natural language processing for deep learning English language. IEEE Access, 8, 46335-46345, (2020).
[32] Tabassum, A. and Patil, R.R. A survey on text pre-processing & feature extraction techniques in natural language processing. International Research Journal of Engineering and Technology, 07(06), 4864-4867, (2020).
[33] Wu, X., Zhao, Y., Yang, Y., Liu, Z. and Clifton, D.A. A comparison of representation learning methods for medical concepts in MIMIC-IV. MedRxiv, 08, 1-9, (2022).
[34] Kasri, M., Birjali, M. and Beni-Hssane, A. A comparison of features extraction methods for Arabic sentiment analysis. In Proceedings, 4th International Conference on Big Data and Internet of Things (BDIoT), pp. 1-6, Rabat, Morocco, (2019, October).
[35] Shuai, Z., Xiaolin, D., Jing, Y., Yanni, H., Meng, C., Yuxin, W. and Wei, Z. Comparison of different feature extraction methods for applicable automated ICD coding. BMC Medical Informatics and Decision Making, 22, 11, (2022).
[36] Moons, E., Khanna, A., Akkasi, A. and Moens, M.F. A comparison of deep learning methods for ICD coding of clinical records. Applied Sciences, 10(15), 5262, (2020).
[37] Singaravelan, A., Hsieh, C.H., Liao, Y.K. and Hsu, J.L. Predicting icd-9 codes using self-report of patients. Applied Sciences, 11(21), 10046, (2021).
[38] Goldberger, A.L., Amaral, L.A., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation, 101(23), e215-e220, (2000).
[39] Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.W.H., Feng, M., Ghassemi, M. et al MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1), 160035, (2016).
[40] Chai, C.P. Comparison of text preprocessing methods. Natural Language Engineering, 29(3), 509-553, (2023).
[41] Xie, X., Xiong, Y., Yu, P.S. and Zhu, Y. EHR coding with multi-scale feature attention and structured knowledge graph propagation. In Proceedings, 28th ACM International Conference on Information and Knowledge Management (CIKM), pp. 649-658, Beijing, China, (2019, November).
[42] Dogra, V., Verma, S., Kavita, Chatterjee, P., Shafi, J., Choi, J. and Ijaz, M.F. A complete process of text classification system using state-of-the-art NLP models. Computational Intelligence and Neuroscience, 2022(1), 1883698, (2022).
[43] Mikolov, T. Efficient estimation of word representations in vector space. ArXiv Preprint, ArXiv:1301.3781, (2013).
[44] Pennington, J., Socher, R. and Manning, C.D. Glove: Global vectors for word representation. In Proceedings, 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, Doha, Qatar, (2014, October).
[45] Devlin, J., Chang, M., Lee, K. and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint, ArXiv:1810.04805, (2018).
[46] Orhan, H. and Yav¸san, E. Artificial intelligence-assisted detection model for melanoma diagnosis using deep learning techniques. Mathematical Modelling and Numerical Simulation with Applications, 3(2), 159-169, (2023).
[47] Almushayqih, J., Oke, A.S. and Juma, B.A. Analysis of patient data to explore cardiovascular risk factors. Mathematical Modelling and Numerical Simulation with Applications, 4(2), 133-148, (2024).
[48] Guo, D., Duan, G., Yu, Y., Li, Y., Wu, F.X. and Li, M. A disease inference method based on symptom extraction and bidirectional Long Short Term Memory networks. Methods, 173, 75-82, (2020).
[49] Prakash, A., Zhao, S., Hasan, S., Datla, V., Lee, K., Qadir, A. et al. Condensed memory networks for clinical diagnostic inferencing. In Proceedings, Thirty-First AAAI Conference on Artificial Intelligence , pp. 3274-3280, San Francisco, California USA, (2017, February).
[50] Dong, H., Suárez-Paniagua, V., Whiteley, W. and Wu, H. Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. Journal of Biomedical Informatics, 116, 103728, (2021).
[51] Cao, P., Chen, Y., Liu, K., Zhao, J., Liu, S. and Chong, W. HyperCore: Hyperbolic and cograph representation for automatic ICD coding. In Proceedings, 58th Annual Meeting of the Association for Computational Linguistics, pp. 3105-3114, Online, (2020, July).
[52] Ji, S., Cambria, E. and Marttinen, P. Dilated convolutional attention network for medical code assignment from clinical text. ArXiv Preprint, ArXiv:2009.14578, (2020).
[53] Ji, S., Hölttä, M. and Marttinen, P. Does the magic of BERT apply to medical code assignment? A quantitative study. Computers in Biology and Medicine, 139, 104998, (2021).
[54] Mayya, V., Kamath, S., Krishnan, G.S. and Gangavarapu, T. Multi-channel, convolutional attention based neural model for automated diagnostic coding of unstructured patient discharge summaries. Future Generation Computer Systems, 118, 374-391, (2021).
[55] Vu, T., Nguyen, D.Q. and Nguyen, A. A label attention model for ICD coding from clinical text. ArXiv Preprint, ArXiv:2007.06351, (2020).

Details

Primary Language

English

Subjects

Applied Mathematics (Other)

Journal Section

Research Article

Authors

Dilek Aydoğan Kılıç
0000-0002-9194-9400
Denmark

Deniz Kenan Kılıç ^*
0000-0002-6996-3425
Denmark

Izabela Ewa Nielsen
0000-0002-3506-2741
Denmark

Early Pub Date

July 15, 2025

Publication Date

June 30, 2025

Submission Date

March 26, 2025

Acceptance Date

June 29, 2025

Published in Issue

Year 2025 Volume: 5 Number: 2

DOI

https://doi.org/10.53391/mmnsa.1666223

IZ

https://izlik.org/JA89LG36BG

APA

Aydoğan Kılıç, D., Kılıç, D. K., & Nielsen, I. E. (2025). Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. Mathematical Modelling and Numerical Simulation With Applications, 5(2), 421-450. https://doi.org/10.53391/mmnsa.1666223

AMA

1.Aydoğan Kılıç D, Kılıç DK, Nielsen IE. Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. MMNSA. 2025;5(2):421-450. doi:10.53391/mmnsa.1666223

Chicago

Aydoğan Kılıç, Dilek, Deniz Kenan Kılıç, and Izabela Ewa Nielsen. 2025. “Comparative Study of Feature Extraction Methods for Automated ICD Code Classification Using MIMIC-III Medical Notes and Deep Learning Models”. Mathematical Modelling and Numerical Simulation With Applications 5 (2): 421-50. https://doi.org/10.53391/mmnsa.1666223.

EndNote

Aydoğan Kılıç D, Kılıç DK, Nielsen IE (June 1, 2025) Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. Mathematical Modelling and Numerical Simulation with Applications 5 2 421–450.

IEEE

[1]D. Aydoğan Kılıç, D. K. Kılıç, and I. E. Nielsen, “Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models”, MMNSA, vol. 5, no. 2, pp. 421–450, June 2025, doi: 10.53391/mmnsa.1666223.

ISNAD

Aydoğan Kılıç, Dilek - Kılıç, Deniz Kenan - Nielsen, Izabela Ewa. “Comparative Study of Feature Extraction Methods for Automated ICD Code Classification Using MIMIC-III Medical Notes and Deep Learning Models”. Mathematical Modelling and Numerical Simulation with Applications 5/2 (June 1, 2025): 421-450. https://doi.org/10.53391/mmnsa.1666223.

JAMA

1.Aydoğan Kılıç D, Kılıç DK, Nielsen IE. Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. MMNSA. 2025;5:421–450.

MLA

Aydoğan Kılıç, Dilek, et al. “Comparative Study of Feature Extraction Methods for Automated ICD Code Classification Using MIMIC-III Medical Notes and Deep Learning Models”. Mathematical Modelling and Numerical Simulation With Applications, vol. 5, no. 2, June 2025, pp. 421-50, doi:10.53391/mmnsa.1666223.

Vancouver

1.Dilek Aydoğan Kılıç, Deniz Kenan Kılıç, Izabela Ewa Nielsen. Comparative study of feature extraction methods for automated ICD code classification using MIMIC-III medical notes and deep learning models. MMNSA. 2025 Jun. 1;5(2):421-50. doi:10.53391/mmnsa.1666223