Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis

Ramiz Görkem Birdal

doi:10.18185/erzifbed.1640563

EN TR

Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis

Öz

Feature selection is a crucial step in optimizing machine learning models, particularly in biomedical applications such as Parkinson’s disease classification based on speech data. This study employs multiple feature importance techniques to identify the most significant predictors and remove redundant variables, thereby improving model interpretability and efficiency. Four distinct methods—Permutation Importance, Mutual Information (MI), ANOVA F-score, and Random Forest Importance—are applied to assess the contribution of each feature to classification performance. Additionally, a correlation analysis is conducted to detect highly correlated features that may introduce multicollinearity. Many studies in existing literature on Parkinson’s disease classification overlook the impact of multicollinearity and redundant features, which can affect model stability and interpretability. Our study addresses this gap by systematically comparing four feature selection methods and incorporating correlation analysis to refine the feature set for improved accuracy and efficiency. By systematically refining the feature set, this approach ensures a balance between model complexity and predictive power, ultimately enhancing the reliability of automated Parkinson’s disease diagnosis from speech recordings.

Anahtar Kelimeler

Parkinson Hastalığı Teşhisi için Özellik Mühendisliği: Rastgele Orman Özellik Seçimi ve Korelasyon Analizini Kullanan Hibrit Bir Yaklaşım

Öz

Özellik seçimi, makine öğrenimi modellerini optimize etmede kritik bir adımdır ve özellikle konuşma verilerine dayalı Parkinson hastalığı sınıflandırması gibi biyomedikal uygulamalarda büyük önem taşır. Bu çalışma, en önemli öngörücü değişkenleri belirlemek ve gereksiz değişkenleri ortadan kaldırarak modelin yorumlanabilirliğini ve verimliliğini artırmak amacıyla birden fazla özellik önem derecelendirme tekniği kullanmaktadır. Sınıflandırma performansına her özelliğin katkısını değerlendirmek için Dizinleme Önem (Permutation Importance), Karşılıklı Bilgi (Mutual Information - MI), ANOVA F-skoru ve Rastgele Orman Önemi (Random Forest Importance) olmak üzere dört farklı yöntem uygulanmaktadır. Ayrıca, yüksek derecede ilişkili özellikleri tespit ederek çoklu bağlantı (multicollinearity) sorununu önlemek için bir korelasyon analizi gerçekleştirilmiştir. Mevcut literatürde Parkinson hastalığı sınıflandırmasına yönelik birçok çalışma, çoklu bağlantı ve gereksiz özelliklerin model kararlılığı ve yorumlanabilirliği üzerindeki etkisini göz ardı etmektedir. Bu çalışma, dört farklı özellik seçme yöntemini sistematik olarak karşılaştırarak ve korelasyon analizini entegre ederek bu boşluğu gidermeyi amaçlamaktadır. Özellik kümesini titizlikle rafine eden bu yaklaşım, model karmaşıklığı ile tahmin gücü arasında bir denge sağlayarak konuşma kayıtlarından otomatik Parkinson hastalığı teşhisinin güvenilirliğini artırmaktadır.

Anahtar Kelimeler

Kaynakça

[1] Shahid, A. H., & Singh, M. P. (2020). A deep learning approach for prediction of Parkinson’s disease progression. Biomedical Engineering Letters, 10, 227-239.
[2] Bakar, Z. A., Ispawi, D. I., Ibrahim, N. F., & Tahir, N. M. (2012, March). Classification of Parkinson's disease based on Multilayer Perceptrons (MLPs) Neural Network and ANOVA as a feature extraction. In 2012 IEEE 8th International Colloquium on Signal Processing and Its Applications (pp. 63-67). IEEE.
[3] Caliskan, A., Badem, H., Basturk, A., & Yuksel, M. (2017). Diagnosis of the parkinson disease by using deep neural network classifier. IU-Journal of Electrical & Electronics Engineering, 17(2), 3311-3318.
[4] Almeida, J. S., Rebouças Filho, P. P., Carneiro, T., Wei, W., Damaševičius, R., Maskeliūnas, R., & de Albuquerque, V. H. C. (2019). Detecting Parkinson’s disease with sustained phonation and speech signals using machine learning techniques. Pattern Recognition Letters, 125, 55-62.
[5] Oktay, A. B., & Kocer, A. (2020). Differential diagnosis of Parkinson and essential tremor with convolutional LSTM networks. Biomedical Signal Processing and Control, 56, 101683.
[6] Khojasteh, P., Viswanathan, R., Aliahmad, B., Ragnav, S., Zham, P., & Kumar, D. K. (2018, October). Parkinson's disease diagnosis based on multivariate deep features of speech signal. In 2018 IEEE life sciences conference (LSC) (pp. 187-190). IEEE. [7] Appakaya, Leung, K. H., Salmanpour, M. R., Saberi, A., Klyuzhin, I. S., Sossi, V., Jha, A. K., ... & Rahmim, A. (2018, November). Using deep-learning to predict outcome of patients with Parkinson’s disease. In 2018 IEEE Nuclear Science Symposium and Medical Imaging Conference Proceedings (NSS/MIC) (pp. 1-4). IEEE.
[8] Xiao, B., He, N., Wang, Q., Cheng, Z., Jiao, Y., Haacke, E. M., ... & Shi, F. (2019). Quantitative susceptibility mapping based hybrid feature extraction for diagnosis of Parkinson's disease. NeuroImage: Clinical, 24, 102070.
[9] Zhang, Y. N. (2017). Can a smartphone diagnose parkinson disease? a deep neural network method and telediagnosis system implementation. Parkinson’s disease, 2017(1), 6209703.

[10] Tsanas, A.; Little, M.; McSharry, P.; Ramig, L. Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. Nat. Preced. 2009.
[11] Frid, A.; Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. J. R. Soc. Interface 2011, 8, 842–855
[12] Appakaya, S.B.; Sankar, R. Classification of Parkinson’s disease Using Pitch Synchronous Speech Analysis. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 1420–1423.
[13] Appakaya, S.B.; Pratihar, R.; Sankar, R. Parkinson’s Disease Classification Framework Using Vocal Dynamics in Connected Speech. Algorithms 2023, 16, 509.
[14] Quan, C.; Ren, K.; Luo, Z. A deep learning based method for Parkinson’s disease detection using dynamic features of speech. IEEE Access 2021, 9, 10239–10252.
[15] Gunduz, H. Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access 2019, 7, 115540–115551.
[16] Chen, H.-L.; Huang, C.-C.; Yu, X.-G.; Xu, X.; Sun, X.; Wang, G.; Wang, S.-J. An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach. Expert Syst. Appl. 2013, 40, 263–271.
[17] Hassin-Baer, S.; Cohen, O.S.; Israeli-Korn, S.; Yahalom, G.; Benizri, S.; Sand, D.; Issachar, G.; Geva, A.B.; Shani-Hershkovich, R.; Peremen, Z. Identification of an early-stage Parkinson’s disease neuromarker using event-related potentials, brain network analytics and machine-learning. PLoS ONE 2022, 17, e0261947.
[18] Vieira, S.; Pinaya, W.H.; Mechelli, A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neurosci. Biobehav. Rev. 2017, 74, 58–75. [19] Güçlü, U.; Van Gerven, M.A. Modeling the dynamics of human brain activity with recurrent neural networks. Front. Comput. Neurosci. 2017, 11, 7.
[20] Nishimoto, S.; Vu, A.T.; Naselaris, T.; Benjamini, Y.; Yu, B.; Gallant, J.L. Reconstructing visual experiences from brain activity evoked by natural movies. Curr. Biol. 2011, 21, 1641– 1646.
[21] Riaz, A.; Asad, M.; Al-Arif, S.M.R.; Alonso, E.; Dima, D.; Corr, P.; Slabaugh, G. Fcnet: A convolutional neural network for calculating functional connectivity from functional mri. In Connectomics in NeuroImaging: Proceedings of the First International Workshop, CNI 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, 14 September 2017; Proceedings; Springer: Cham, Switzerland, 2017; pp. 70–78.
[22] Rehman, R.Z.U.; Del Din, S.; Guan, Y.; Yarnall, A.J.; Shi, J.Q.; Rochester, L. Selecting clinically relevant gait characteristics for classification of early Parkinson’s disease: A comprehensive machine learning approach. Sci. Rep. 2019, 9, 17269.
[23] Birdal, R., & Sertbaş, A. (2023). 3-D Gait Identification Utilizing Latent Canonical Covariates Consisting of Gait Features. Computers, Materials and Continua, 76(3).
[24] Zheng, Y.; Weng, Y.; Yang, X.; Cai, G.; Cai, G.; Song, Y. SVM-based gait analysis and classification for patients with Parkinson’s disease. In Proceedings of the 2021 15th International Symposium on Medical Information and Communication Technology (ISMICT), Xiamen, China, 14–16 April 2021; pp. 53–58.
[25] Perumal, S.V.; Sankar, R. Gait monitoring system for patients with Parkinson’s disease using wearable sensors. In Proceedings of the 2016 IEEE Healthcare Innovation Point-of-Care Technologies Conference (HI-POCT), Cancun, Mexico, 9–11 November 2016; pp. 21–24.
[26] Joshi, D.; Khajuria, A.; Joshi, P. An automatic non-invasive method for Parkinson’s disease classification. Comput. Methods Programs Biomed. 2017, 145, 135–145.
[27] Lee, S.; Hussein, R.; McKeown, M.J. A Deep Convolutional-Recurrent Neural Network Architecture for Parkinson’s Disease EEG Classification. In Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Ottawa, ONT, Canada, 11–14 November 2019
[28] Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. Nat. Preced. 2009, 57, 884–893
[29] Karabayir, I.; Goldman, S.M.; Pappu, S.; Akbilgic, O. Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Med. Inform. Decis. Mak. 2020, 20, 228.
[30] Zhang, Y.N. Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network Method and Telediagnosis System Implementation. Park. Dis. 2017, 2017, 6209703 [31] Pereira, C.R.; Pereira, D.R.; Papa, J.P.; Rosa, G.H.; Yang, X.-S. Convolutional Neural Networks Applied for Parkinson’s Disease Identification. Lect. Notes Comput. Sci. 2016, 9605, 377–390
[32] Sancar, Y. (2024). Enhanced Classification of Skin Lesions Using Fine-Tuned MobileNet and DenseNet121 Models with Ensemble Learning. Erzincan University Journal of Science and Technology, 17(3), 870-883.
[33] Aksakallı, I., Kaçdıoğlu, S., & Hanay, Y. S. (2021). Kidney x-ray images classification using machine learning and deep learning methods. Balkan Journal of Electrical and Computer Engineering, 9(2), 144-151.
[34] Isenkul, M.E.; Sakar, B.E.; Kursun, O. . 'Improved spiral test using digitized graphics tablet for monitoring Parkinson's disease.' The 2nd International Conference on e-Health and Telemedicine (ICEHTM-2014), pp. 171-175, 2014.
[35] Das, R. (2010). A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Systems with Applications, 37(2), 1568–1572.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Bilgi Modelleme, Yönetim ve Ontolojiler, Bilgi Sistemleri Geliştirme Metodolojileri ve Uygulamaları, Bilgi Sistemleri (Diğer)

Bölüm

Araştırma Makalesi

Yazarlar

Ramiz Görkem Birdal ^*
0000-0003-1283-0530
Türkiye

Yayımlanma Tarihi

30 Mart 2026

Gönderilme Tarihi

15 Şubat 2025

Kabul Tarihi

17 Temmuz 2025

Yayımlandığı Sayı

Yıl 2026 Cilt: 19 Sayı: 1

DOI

https://doi.org/10.18185/erzifbed.1640563

IZ

https://izlik.org/JA72SP23KA

Kaynak Göster

RIS / Bibtex

APA

Birdal, R. G. (2026). Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis. Erzincan University Journal of Science and Technology, 19(1), 331-356. https://doi.org/10.18185/erzifbed.1640563

AMA

1.Birdal RG. Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis. Erzincan University Journal of Science and Technology. 2026;19(1):331-356. doi:10.18185/erzifbed.1640563

Chicago

Birdal, Ramiz Görkem. 2026. “Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis”. Erzincan University Journal of Science and Technology 19 (1): 331-56. https://doi.org/10.18185/erzifbed.1640563.

EndNote

Birdal RG (01 Mart 2026) Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis. Erzincan University Journal of Science and Technology 19 1 331–356.

IEEE

[1]R. G. Birdal, “Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis”, Erzincan University Journal of Science and Technology, c. 19, sy 1, ss. 331–356, Mar. 2026, doi: 10.18185/erzifbed.1640563.

ISNAD

Birdal, Ramiz Görkem. “Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis”. Erzincan University Journal of Science and Technology 19/1 (01 Mart 2026): 331-356. https://doi.org/10.18185/erzifbed.1640563.

JAMA

1.Birdal RG. Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis. Erzincan University Journal of Science and Technology. 2026;19:331–356.

MLA

Birdal, Ramiz Görkem. “Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis”. Erzincan University Journal of Science and Technology, c. 19, sy 1, Mart 2026, ss. 331-56, doi:10.18185/erzifbed.1640563.

Vancouver

1.Ramiz Görkem Birdal. Feature Engineering for Parkinson’s Disease Diagnosis: A Hybrid Approach Using Random Forest Feature Selection and Correlation Analysis. Erzincan University Journal of Science and Technology. 01 Mart 2026;19(1):331-56. doi:10.18185/erzifbed.1640563