Feature Selection Approach to Optimize Depression Detection from EHR Data
Year 2026,
Volume: 10 Issue: 1, 116 - 128, 16.12.2025
Anju Gera
,
Satendra Kumar
,
Rupa Sharma
,
Arun Kumar
Abstract
Early detection of depression is critical and often requires professional skill. Depression is a complex disorder that has a negative impact on a person's behaviours, cognition, and emotions. This psychiatric disorder is considered one of the most profound, affecting around 280 million people worldwide. This study investigates the potential for evolving an effective tool for detecting depression using various Machine Learning (ML) algorithms. The research examined a dataset of 4184 samples, which included biometric and demographic data from people with and without depression. The study incorporated Genetic Algorithms (GAs) and Firefly Algorithms (FAs), along with techniques like feature elimination, and bio-inspired methods such as Particle Swarm Optimization (PSO) and Mutual Information (MI). The researchers employed a majority vote strategy across all feasible permutations of three, four, and five feature selection of procedures, resulting in a notable decrease in the feature set size occurred (mean size-34). The improved model attained classification accuracy between 83.13% and 89.09%, demonstrating a significant performance increase over previous models (76%–88.49%). The suggested prediction models outperformed conventional classification models without feature selection, increasing the diagnostic process's efficiency and efficacy. This study is a promising start toward developing a more accurate and efficient automated method for early depression identification.
References
-
References
Yu, R., Zhang, F., & Gu, Q. (2023). Predicting depression and major depressive disorder using electronic health records and machine learning. BMC Medical Informatics and Decision Making, 23(1), 234-244.
-
Bhadra, S., & Kumar, C. J. (2023). Enhancing the efficacy of depression detection system using optimal feature selection from EHR. Computer Methods in Biomechanics and Biomedical Engineering, 1-15.
-
AlSagri, H. S., & Ykhlef, M. (2020). Machine learning-based approach for depression detection in twitter using content and activity features. IEICE Transactions on Information and Systems, 103(8), 1825-1832.
-
Kumari, M., Singh, G., & Pande, S. D. (2025). A Survey of Current Progress in Depression Detection Using Deep Learning and Machine Learning. Biomedical Materials & Devices, 1-25.
-
Liu, F., Li, F., Zhang, R., Zhang, Y., & Zhang, Y. (2023). Identifying patients with depression and major depressive disorder using electronic health records and external knowledge base with MLmodels. Journal of Healthcare Informatics Research, 35(3), 1087-1101.
-
Huang, Y., Wu, X., Yang, L., Zhou, J., & Wang, R. (2023). Prediction of depression based on electronic health records using a long short-term memory neural network model. Journal of Biomedical Informatics, 144, 109085.
-
Choi, Y., Ahn, S., Jung, H., Yoon, J., & Yoo, B. (2022). Machine learning-based identification of patients with depression from electronic health records: A focus on feature importance analysis. International Journal of Medical Informatics, 159, 104591.
-
Liu, Y., Yang, Y., Huang, M., Shi, Z., & Li, R. (2021). MLmodel for depression diagnosis and risk prediction based on electronic health records with dynamic features. Computers in Biology and Medicine, 139, 105049.
-
Gupta, S., Rajendra, A., & Reddy, C. K. (2020). MLbased early detection of depression using electronic health records. Healthcare Informatics Research, 26(4), 339-347.
-
Cheng, R., Wang, Z., Wu, Y., Zhou, Y., & Wang, F. (2019). Large-scale prediction of depression using electronic health records: A MLapproach. Journal of Affective Disorders, 245, 84-91.
-
Muzammel M, Salam H, Hoffmann Y, Chetouani M, Othmani, A. 2020 AudVowelConsNet: A phoneme-level based deep CNN architecture for clinical depression diagnosis. MLwith Applications. 2: 100005.
-
World Health Organization. [accessed on 5 November 2021]. https://www.who.int/news-room/fact-sheets/detail/depression/
-
Depression rates by 2021. [accessed on 16 January 2022]. https://worldpopulationreview.com/country-rankings/depression-rates-by- country/
-
Tao X, Chi O, Delaney PJ, Li L, Huang J. 2021. Detecting depression using an ensemble classifier based on Quality of Life scales. Brain Informatics, 8(1): 1-15.
-
Su D, Zhang X, He K, Chen Y. 2021. Use of MLapproach to predict depression in the elderly in China: A longitudinal study. Journal of Affective Disorders. 282:289-298.
-
Zogan H, Wang X, Jameel S, Xu G. 2020. Depression detection with multi-modalities using a hybrid deep learning model on social media.
-
Regier DA, Kuhl EA, Kupfer DJ. 2013. The DSM-5: Classification and criteria changes. World psychiatry. 12(2): 92-98.
-
Acharya U R, Sudarshan V K, Adeli H, Santhosh J, Koh J E and Adeli A. 2015. Computer-aided diagnosis of depression using EEG signals. European neurology. 73(5-6).329-336.
-
Nemesure M D, Heinz MV, Huang R, Jacobson N C. 2021. Predictive modeling of depression and anxiety using electronic health records and a novel ML approach with artificial intelligence. Scientific reports. 11(1):1-9.
-
Meng Y, Speier W, Ong M K, Arnold CW. 2021. Bidirectional Representation Learning from Transformers Using Multimodal Electronic Health Record Data to Predict Depression. IEEE Journal of Biomedical and Health Informatics.25(8): 3121–3129.
-
Richter T, Fishbain B, Richter-Levin G, Okon-Singer H. 2021. Machine Learning-Based Behavioral Diagnostic Tools for Depression: Advances, Challenges, and Future Directions. Journal of Personalized Medicine. 11(10): 957.
-
Kumar C J, Das P R. 2021. The diagnosis of ASD using multiple MLtechniques. International Journal of Developmental Disabilities. 1-11.
-
Tan F, Fu X, Zhang Y, Bourgeois AG. 2008. A genetic algorithm-based method for feature subset selection. Soft Computing. 12(2):111-120.
-
Beraha M, Metelli A M, Papini M, Tirinzoni A, Restelli M. 2019. Feature Selection via Mutual Information: New Theoretical Insights.
-
Ahmad I. 2015.Feature Selection Using Particle Swarm Optimization in Intrusion Detection. 11(10):806954.
-
Chandrashekar G, Sahin F. 2014. A survey on feature selection methods. Computers and Electrical Engineering. 40(1):16–28.
-
Mashhour E M., El Houby E M, Wassif K T, Salah AI. 2018. Feature Selection Approach based on Firefly Algorithm and Chi-square. International Journal of Electrical & Computer Engineering. 8(4): 2088-8708.
-
Wang S, Pathak J, Zhang Y. 2019. Using electronic health records and MLto predict postpartum depression. In MEDINFO 2019: Health and Wellbeing e-Networks for All. 888-892.
-
Hochman E, Feldman B, Weizman A, Krivoy A, Gur S, Barzilay E, Lawrence G. 2021. Development and validation of a machine learning-based postpartum depression prediction model: A nationwide cohort study. Depression and anxiety. 38(4): 400-411.
-
Amit G, Girshovitz I, Marcus K, Zhang Y, Pathak J, Bar V, Akiva P. 2021. Estimation of postpartum depression risk from electronic health records using machine learning. BMC pregnancy and childbirth. 21(1), 1-10.
-
Xu Z, Wang F, Adekkanattu P, Bose B, Vekaria V, Brandt P, Jiang G, Kiefer R C, Luo Y, Pacheco J A, Rasmussen LV, Xu J, Alexopoulos G, Pathak J. 2020. Subphenotyping depression using MLand electronic health records. Learning Health Systems. 4(4).
-
Richter T, Fishbain B, Markus A, Richter-Levin G, Okon-Singer, H. 2020. Using machine learning-based analysis for behavioral differentiation between anxiety and depression. Scientific Reports. 10(1).
-
Geraci J, Wilansky P, de Luca V, Roy A, Kennedy J L, Strauss J. 2017. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. Evidence-Based Mental Health. 20(3): 83-87.
-
Tran A, Tran L, Geghre N, Darmon D, Rampal M, Brandone D, Gozzo J M, Haas H, Rebouillat-Savy K, Caci H, Avillach P. 2017. Health assessment of French university students and risk factors associated with mental health disorders. PLoS ONE. 12(11).
-
Wang F, Zhang H, Zhou A. 2021. A particle swarm optimization algorithm for mixed-variable optimization problems. Swarm and Evolutionary Computation. 60: 100808.
-
Rostami M, Berahmand K, Forouzandeh S. 2021. A novel community detection based genetic algorithm for feature selection. Journal of Big Data. 8(1):1-27.
-
Shatte ABR, Hutchinson DM, Teague SJ. MLin mental health: a scoping review of methods and applications. Psychol Med. 2019;49(9):1426–48.
-
Selvakumar B, Muneeswaran K. 2019. Firefly algorithm based feature selection for network intrusion detection. Computers Security.81: 148-155.
-
Wu J, Roy J, Stewart W F. 2010. Prediction modeling using EHR data: challenges, strategies, and a comparison of MLapproaches. Medical care. S106-S113.
-
Ding Y, Chen X, Fu Q, Zhong S. 2020. A depression recognition method for college students using deep integrated support vector algorithm. IEEE Access. 8: 75616-75629.
-
Onan A. 2022. Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University-Computer and Information Sciences. 34(5): 2098-2117.
-
Onan A. 2019. Topic-enriched word embeddings for sarcasm identification. In Computer science on-line conference. 293-304.
-
Onan A. 2018. Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine.
-
Zhang Y, Wang K, Wei Y, Guo X, Wen J, Luo Y. 2022. Minimal EEG channel selection for depression detection with connectivity features during sleep. Computers in Biology and Medicine. 147:105690.
-
Chiong R, Budhi G S, Dhakal S, Chiong F. 2021. A textual-based featuring approach for depression detection using MLclassifiers and social media texts. Computers in Biology and Medicine. 135: 104499.
-
Li S, Chen H, Wang M, Heidari AA, Mirjalili S. 2020. Slime mould algorithm: A new method for stochastic optimization. Future Generation Computer Systems, 111: 300-323.
-
Ahmadianfar I, Heidari AA, Gandomi AH, Chu X, Chen H. 2021. RUN beyond the metaphor: An efficient optimization algorithm based on Runge Kutta method. Expert Systems with Applications, 181:115079.
-
Akter, S., Liu, Z., Simoes, E.J. and Rao, P., 2025. Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias. The Journal of Prevention of Alzheimer's Disease, p.100169.
-
Balakrishna, N., Krishnan, M.B. and Ganesh, D., 2024. Hybrid Machine Learning Approaches for Predicting and Diagnosing Major Depressive Disorder. International Journal of Advanced Computer Science & Applications, 15(3).
-
Toprak, N., & Yalman, Y. (2025). Ship Detection from Optical Satellite Images Using Convolutional Neural Networks. Turkish Journal of Engineering, 9(2), 342-353. https://doi.org/10.31127/tuje.1529660.
-
Acı, Ç., Çürük, E., & Eşsiz, E. S. (2019). AUTOMATIC DETECTION OF CYBERBULLYING IN FORMSPRING.ME, MYSPACE AND YOUTUBE SOCIAL NETWORKS. Turkish Journal of Engineering, 3(4), 168-178. https://doi.org/10.31127/tuje.554417.
-
Unel, F. B., Kusak, L., & Yakar, M. (2023). GeoValueIndex map of public property assets generating via Analytic Hierarchy Process and Geographic Information System for Mass Appraisal: GeoValueIndex. Aestimum, 82, 51-69.
-
Yilmaz, H. M., Yakar, M., & Yildiz, F. (2008). Digital photogrammetry in obtaining of 3D model data of irregular small objects. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 37, 125-130.
-
Unal, M., Yakar, M., & Yildiz, F. (2004, July). Discontinuity surface roughness measurement techniques and the evaluation of digital photogrammetric method. In Proceedings of the 20th international congress for photogrammetry and remote sensing, ISPRS (Vol. 1103, p. 1108).
Year 2026,
Volume: 10 Issue: 1, 116 - 128, 16.12.2025
Anju Gera
,
Satendra Kumar
,
Rupa Sharma
,
Arun Kumar
References
-
References
Yu, R., Zhang, F., & Gu, Q. (2023). Predicting depression and major depressive disorder using electronic health records and machine learning. BMC Medical Informatics and Decision Making, 23(1), 234-244.
-
Bhadra, S., & Kumar, C. J. (2023). Enhancing the efficacy of depression detection system using optimal feature selection from EHR. Computer Methods in Biomechanics and Biomedical Engineering, 1-15.
-
AlSagri, H. S., & Ykhlef, M. (2020). Machine learning-based approach for depression detection in twitter using content and activity features. IEICE Transactions on Information and Systems, 103(8), 1825-1832.
-
Kumari, M., Singh, G., & Pande, S. D. (2025). A Survey of Current Progress in Depression Detection Using Deep Learning and Machine Learning. Biomedical Materials & Devices, 1-25.
-
Liu, F., Li, F., Zhang, R., Zhang, Y., & Zhang, Y. (2023). Identifying patients with depression and major depressive disorder using electronic health records and external knowledge base with MLmodels. Journal of Healthcare Informatics Research, 35(3), 1087-1101.
-
Huang, Y., Wu, X., Yang, L., Zhou, J., & Wang, R. (2023). Prediction of depression based on electronic health records using a long short-term memory neural network model. Journal of Biomedical Informatics, 144, 109085.
-
Choi, Y., Ahn, S., Jung, H., Yoon, J., & Yoo, B. (2022). Machine learning-based identification of patients with depression from electronic health records: A focus on feature importance analysis. International Journal of Medical Informatics, 159, 104591.
-
Liu, Y., Yang, Y., Huang, M., Shi, Z., & Li, R. (2021). MLmodel for depression diagnosis and risk prediction based on electronic health records with dynamic features. Computers in Biology and Medicine, 139, 105049.
-
Gupta, S., Rajendra, A., & Reddy, C. K. (2020). MLbased early detection of depression using electronic health records. Healthcare Informatics Research, 26(4), 339-347.
-
Cheng, R., Wang, Z., Wu, Y., Zhou, Y., & Wang, F. (2019). Large-scale prediction of depression using electronic health records: A MLapproach. Journal of Affective Disorders, 245, 84-91.
-
Muzammel M, Salam H, Hoffmann Y, Chetouani M, Othmani, A. 2020 AudVowelConsNet: A phoneme-level based deep CNN architecture for clinical depression diagnosis. MLwith Applications. 2: 100005.
-
World Health Organization. [accessed on 5 November 2021]. https://www.who.int/news-room/fact-sheets/detail/depression/
-
Depression rates by 2021. [accessed on 16 January 2022]. https://worldpopulationreview.com/country-rankings/depression-rates-by- country/
-
Tao X, Chi O, Delaney PJ, Li L, Huang J. 2021. Detecting depression using an ensemble classifier based on Quality of Life scales. Brain Informatics, 8(1): 1-15.
-
Su D, Zhang X, He K, Chen Y. 2021. Use of MLapproach to predict depression in the elderly in China: A longitudinal study. Journal of Affective Disorders. 282:289-298.
-
Zogan H, Wang X, Jameel S, Xu G. 2020. Depression detection with multi-modalities using a hybrid deep learning model on social media.
-
Regier DA, Kuhl EA, Kupfer DJ. 2013. The DSM-5: Classification and criteria changes. World psychiatry. 12(2): 92-98.
-
Acharya U R, Sudarshan V K, Adeli H, Santhosh J, Koh J E and Adeli A. 2015. Computer-aided diagnosis of depression using EEG signals. European neurology. 73(5-6).329-336.
-
Nemesure M D, Heinz MV, Huang R, Jacobson N C. 2021. Predictive modeling of depression and anxiety using electronic health records and a novel ML approach with artificial intelligence. Scientific reports. 11(1):1-9.
-
Meng Y, Speier W, Ong M K, Arnold CW. 2021. Bidirectional Representation Learning from Transformers Using Multimodal Electronic Health Record Data to Predict Depression. IEEE Journal of Biomedical and Health Informatics.25(8): 3121–3129.
-
Richter T, Fishbain B, Richter-Levin G, Okon-Singer H. 2021. Machine Learning-Based Behavioral Diagnostic Tools for Depression: Advances, Challenges, and Future Directions. Journal of Personalized Medicine. 11(10): 957.
-
Kumar C J, Das P R. 2021. The diagnosis of ASD using multiple MLtechniques. International Journal of Developmental Disabilities. 1-11.
-
Tan F, Fu X, Zhang Y, Bourgeois AG. 2008. A genetic algorithm-based method for feature subset selection. Soft Computing. 12(2):111-120.
-
Beraha M, Metelli A M, Papini M, Tirinzoni A, Restelli M. 2019. Feature Selection via Mutual Information: New Theoretical Insights.
-
Ahmad I. 2015.Feature Selection Using Particle Swarm Optimization in Intrusion Detection. 11(10):806954.
-
Chandrashekar G, Sahin F. 2014. A survey on feature selection methods. Computers and Electrical Engineering. 40(1):16–28.
-
Mashhour E M., El Houby E M, Wassif K T, Salah AI. 2018. Feature Selection Approach based on Firefly Algorithm and Chi-square. International Journal of Electrical & Computer Engineering. 8(4): 2088-8708.
-
Wang S, Pathak J, Zhang Y. 2019. Using electronic health records and MLto predict postpartum depression. In MEDINFO 2019: Health and Wellbeing e-Networks for All. 888-892.
-
Hochman E, Feldman B, Weizman A, Krivoy A, Gur S, Barzilay E, Lawrence G. 2021. Development and validation of a machine learning-based postpartum depression prediction model: A nationwide cohort study. Depression and anxiety. 38(4): 400-411.
-
Amit G, Girshovitz I, Marcus K, Zhang Y, Pathak J, Bar V, Akiva P. 2021. Estimation of postpartum depression risk from electronic health records using machine learning. BMC pregnancy and childbirth. 21(1), 1-10.
-
Xu Z, Wang F, Adekkanattu P, Bose B, Vekaria V, Brandt P, Jiang G, Kiefer R C, Luo Y, Pacheco J A, Rasmussen LV, Xu J, Alexopoulos G, Pathak J. 2020. Subphenotyping depression using MLand electronic health records. Learning Health Systems. 4(4).
-
Richter T, Fishbain B, Markus A, Richter-Levin G, Okon-Singer, H. 2020. Using machine learning-based analysis for behavioral differentiation between anxiety and depression. Scientific Reports. 10(1).
-
Geraci J, Wilansky P, de Luca V, Roy A, Kennedy J L, Strauss J. 2017. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. Evidence-Based Mental Health. 20(3): 83-87.
-
Tran A, Tran L, Geghre N, Darmon D, Rampal M, Brandone D, Gozzo J M, Haas H, Rebouillat-Savy K, Caci H, Avillach P. 2017. Health assessment of French university students and risk factors associated with mental health disorders. PLoS ONE. 12(11).
-
Wang F, Zhang H, Zhou A. 2021. A particle swarm optimization algorithm for mixed-variable optimization problems. Swarm and Evolutionary Computation. 60: 100808.
-
Rostami M, Berahmand K, Forouzandeh S. 2021. A novel community detection based genetic algorithm for feature selection. Journal of Big Data. 8(1):1-27.
-
Shatte ABR, Hutchinson DM, Teague SJ. MLin mental health: a scoping review of methods and applications. Psychol Med. 2019;49(9):1426–48.
-
Selvakumar B, Muneeswaran K. 2019. Firefly algorithm based feature selection for network intrusion detection. Computers Security.81: 148-155.
-
Wu J, Roy J, Stewart W F. 2010. Prediction modeling using EHR data: challenges, strategies, and a comparison of MLapproaches. Medical care. S106-S113.
-
Ding Y, Chen X, Fu Q, Zhong S. 2020. A depression recognition method for college students using deep integrated support vector algorithm. IEEE Access. 8: 75616-75629.
-
Onan A. 2022. Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University-Computer and Information Sciences. 34(5): 2098-2117.
-
Onan A. 2019. Topic-enriched word embeddings for sarcasm identification. In Computer science on-line conference. 293-304.
-
Onan A. 2018. Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine.
-
Zhang Y, Wang K, Wei Y, Guo X, Wen J, Luo Y. 2022. Minimal EEG channel selection for depression detection with connectivity features during sleep. Computers in Biology and Medicine. 147:105690.
-
Chiong R, Budhi G S, Dhakal S, Chiong F. 2021. A textual-based featuring approach for depression detection using MLclassifiers and social media texts. Computers in Biology and Medicine. 135: 104499.
-
Li S, Chen H, Wang M, Heidari AA, Mirjalili S. 2020. Slime mould algorithm: A new method for stochastic optimization. Future Generation Computer Systems, 111: 300-323.
-
Ahmadianfar I, Heidari AA, Gandomi AH, Chu X, Chen H. 2021. RUN beyond the metaphor: An efficient optimization algorithm based on Runge Kutta method. Expert Systems with Applications, 181:115079.
-
Akter, S., Liu, Z., Simoes, E.J. and Rao, P., 2025. Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias. The Journal of Prevention of Alzheimer's Disease, p.100169.
-
Balakrishna, N., Krishnan, M.B. and Ganesh, D., 2024. Hybrid Machine Learning Approaches for Predicting and Diagnosing Major Depressive Disorder. International Journal of Advanced Computer Science & Applications, 15(3).
-
Toprak, N., & Yalman, Y. (2025). Ship Detection from Optical Satellite Images Using Convolutional Neural Networks. Turkish Journal of Engineering, 9(2), 342-353. https://doi.org/10.31127/tuje.1529660.
-
Acı, Ç., Çürük, E., & Eşsiz, E. S. (2019). AUTOMATIC DETECTION OF CYBERBULLYING IN FORMSPRING.ME, MYSPACE AND YOUTUBE SOCIAL NETWORKS. Turkish Journal of Engineering, 3(4), 168-178. https://doi.org/10.31127/tuje.554417.
-
Unel, F. B., Kusak, L., & Yakar, M. (2023). GeoValueIndex map of public property assets generating via Analytic Hierarchy Process and Geographic Information System for Mass Appraisal: GeoValueIndex. Aestimum, 82, 51-69.
-
Yilmaz, H. M., Yakar, M., & Yildiz, F. (2008). Digital photogrammetry in obtaining of 3D model data of irregular small objects. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 37, 125-130.
-
Unal, M., Yakar, M., & Yildiz, F. (2004, July). Discontinuity surface roughness measurement techniques and the evaluation of digital photogrammetric method. In Proceedings of the 20th international congress for photogrammetry and remote sensing, ISPRS (Vol. 1103, p. 1108).