Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods

Mustafa Alptekin Engin

doi:10.7212/karaelmasfen.1484222

Araştırma Makalesi

Veri Ayırma ve Doğrulama Yöntemleri Kullanılarak Makine Öğrenmesi Aracılığı ile Parkinson Hastalığı Tespiti

Yıl 2024, Cilt: 14 Sayı: 2, 134 - 147, 23.07.2024

Mustafa Alptekin Engin

Öz

Nörolojik bir bozukluk olan Parkinson hastalığı (PH), hastaların ve bakımlarından sorumlu kişilerin hayatlarını olumsuz olarak etkilemektedir. Kişinin klinik özelliklerinin incelenmesi ile erken tanısı oldukça zor olan PH, konuşma ses kayıtları kullanılarak teşhis edilebilmektedir. Fakat ses kayıtlarının makine öğrenmesi teknikleri aracılığı ile değerlendirilmesinden elde edilen modellerin tutarsız performans sonuçları, bu modellerin hekimlerin teşhis koymasında yardımcı olarak kullanılabilirliğini sınırlamaktadır. Yapılan çalışmada 23’ü Parkinson hastası olan toplam 31 kişiden elde edilen ve 195 ses verisinden oluşan bir veri tabanı kullanılmıştır. Veri tabanındaki her bir konuşma sesinden elde edilen 22 adet öznitelik ile bu seslerin makine öğrenmesi aracılığıyla hasta ve sağlıklı olarak sınıflandırılması gerçekleştirilmiştir. Bu sınıflandırma işleminde eğitim ve test aşamasında kullanılacak verilerin rastgele olarak sırası ile 90/10, 80/20, 70/30, 50/50 ve 30/70 olmak üzere farklı oranlarda bölünmesi sağlamıştır. Ayrıca her bir ayırma oranı, eğitim aşamasında 10 katmanlı çapraz doğrulama, 5 katmanlı çapraz doğrulama, ayırarak doğrulama ve yeniden ikame doğrulaması yöntemleri kullanılarak değerlendirilmiştir. Bununla beraber kuadratik diskriminant, destek vektör makineleri, toplu torbalı ağaç, k-en yakın komşuluk ve sinir ağları sınıflandırıcıları kullanılarak sınıflandırma işlemi gerçekleştirilmiştir. Veri ayırmadaki rastgeleliğin ve tutarlı sonuçların elde edilmesi için tüm işlemler 10 defa tekrar edilmiştir. Yöntemlerin başarımlarının karşılaştırılmasında doğruluk, duyarlılık, özgüllük, kesinlik ve F1 skoru metrikleri aracılığı ile sonuçların ortalama ve standart sapma değerleri hesaplanmıştır. Sonuç olarak 80/20 ayırma oranı ve 10 katmanlı çapraz doğrulama kullanan k-en yakın komşuluk sınıflandırıcısına ait %95.64±3.21 test doğruluğu değeri, karşılaştırılan yöntemler içerisinde en başarılı yöntem olarak tespit edilmiştir. Dolayısıyla sadece sınıflandırıcılara ait mevcut parametrelerin etkileri analiz edilerek çok daha başarılı sonuçların elde edilebileceği görülmüştür.

Anahtar Kelimeler

çapraz doğrulama, makine öğrenmesi, sınıflandırma, tekrarlı eğitim/test ayrımı

Kaynakça

Abdulateef, SK., Ismael, AN., Salman, MD. 2023. Feature weighting for Parkinson’s identification using single hidden layer neural network. Computing, 225–230. https://doi.org/10.47839/ijc.22.2.3092
Bang, C., Bogdanovic, N., Deutsch, G., Marques, O. 2023. Machine learning for the diagnosis of Parkinson’s disease using speech analysis: a systematic review. International Journal of Speech Technology, 26(4), 991–998. https://doi.org/10.1007/s10772-023-10070-9
Bhavsar, K., Vakharia, V., Chaudhari, R., Vora, J., Pimenov, DY., Giasin, K. 2022. A comparative study to predict bearing degradation using discrete wavelet transform (DWT), tabular generative adversarial networks (TGAN) and machine learning models. Machines, 10(3), 176. https://doi.org/10.3390/machines10030176
Çağlar, MF., Çetişli, B., Toprak, İB. 2010. Automatic Recognition of Parkinson’s Disease from Sustained Phonation Tests Using ANN and Adaptive Neuro-Fuzzy Classifier. Mühendislik Bilimleri Ve Tasarım Dergisi, 1(2), 59–64.
Cortes, C., Vapnik, V. 1995. Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/bf00994018
Das, R. 2010. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Systems with Applications, 37(2), 1568–1572. https://doi.org/10.1016/j.eswa.2009.06.040
Duda, RO., Hart, PE., Stork, DG. 2022. Pattern Classification (3rd ed.). Standards Information Network.
Ekpezu, AO., Katsriku, F., Yaokumah, W., Wiafe, I. 2022. The use of machine learning algorithms in the classification of sound: A systematic review. International Journal of Service Science Management Engineering and Technology, 13(1), 1–28. https://doi.org/10.4018/ijssmet.298667
Ene, M. 2008. Neural network-based approach to discriminate healthy people from those with Parkinson’s disease. Annals of the University of Craiova-Mathematics and Computer Science Series, 35, 112–116.
Esmer, S., Uçar, MK., Çi̇l, İ., Bozkurt, MR. 2020. Parkinson Hastalığı Teşhisi İçin Makine Öğrenmesi Tabanlı Yeni Bir Yöntem. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 8(3), 1877–1893. https://doi.org/10.29130/dubited.688223
Fix, E., Hodges, JL. 1951. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties USAF School of Aviation Medicine.
Gupta, D., Julka, A., Jain, S., Aggarwal, T., Khanna, A., Arunkumar, N., de Albuquerque, VHC. 2018. Optimized cuttlefish algorithm for diagnosis of Parkinson’s disease. Cognitive Systems Research, 52, 36–48. https://doi.org/10.1016/j.cogsys.2018.06.006
Huang, Y., Chen, Q., Wang, Z., Wang, Y., Lian, A., Zhou, Q., Zhao, G., Xia, K., Tang, B., Li, B., Li, J. 2024. Risk factors associated with age at onset of Parkinson’s disease in the UK Biobank. NPJ Parkinson’s Disease, 10(1), 3. https://doi.org/10.1038/s41531-023-00623-9
Inzamam-Ul-Hossain, M., MacKinnon, L., Islam, MR. 2015. Parkinson disease detection using ensemble method in PASW benchmark. 2015 IEEE International Advance Computing Conference (IACC).
Islam, MA., Hasan Majumder, MZ., Hussein, MA., Hossain, KM., Miah, MS. 2024. A review of machine learning and deep learning algorithms for Parkinson’s disease detection using handwriting and voice datasets. Heliyon, 10(3), e25469. https://doi.org/10.1016/j.heliyon.2024.e25469
Iyer, A., Kemp, A., Rahmatallah, Y., Pillai, L., Glover, A., Prior, F., Larson-Prior, L., Virmani, T. 2023. A machine learning method to process voice samples for identification of Parkinson’s disease. Scientific Reports, 13(1), 20615. https://doi.org/10.1038/s41598-023-47568-w
James, G., Witten, D., Hastie, T., Tibshirani, R. 2013. An introduction to statistical learning: With applications in R (1st ed.). Springer.
Jana, DK., Bhunia, P., Adhikary, SD., Mishra, A. 2023. Analyzing of salient features and classification of wine type based on quality through various neural network and support vector machine classifiers. Results in Control and Optimization, 11(100219), 100219. https://doi.org/10.1016/j.rico.2023.100219
Little, M. 2007. Parkinsons [Data set]. UCI Machine Learning Repository. https://doi.org/10.24432/C59C74
Little, MA., McSharry, PE., Hunter, EJ., Spielman, J., Ramig, LO. 2009. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Transactions on Bio-Medical Engineering, 56(4), 1015–1022. https://doi.org/10.1109/tbme.2008.2005954
Luukka, P. 2011. Feature selection using fuzzy entropy measures with similarity classifier. Expert Systems with Applications, 38(4), 4600–4607. https://doi.org/10.1016/j.eswa.2010.09.133
Molera, LM. 2024. Machine learning Q&A: All about model validation. Mathworks.com. https://ch.mathworks.com/campaigns/offers/next/all-about-model-validation.html
Nareklishvili, M., Geitle, M. 2024. Deep ensemble transformers for dimensionality reduction. IEEE Transactions on Neural Networks and Learning Systems, 1–12. https://doi.org/10.1109/tnnls.2024.3357621
Orozco-Arroyave, JR., Vdsquez-Correa, JC., Honig, F., Arias-Londono, JD., Vargas-Bonilla, JF., Skodda, S., Rusz, J., Noth, E. 2016. Towards an automatic monitoring of the neurological state of Parkinson’s patients from speech. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Rusz, J., Cmejla, R., Ruzickova, H., Ruzicka, E. 2011. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. The Journal of the Acoustical Society of America, 129(1), 350–367. https://doi.org/10.1121/1.3514381
Senturk, ZK. 2020. Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses, 138(109603), 109603. https://doi.org/10.1016/j.mehy.2020.109603
Sharma, P., Sundaram, S., Sharma, M., Sharma, A., Gupta, D. 2019. Diagnosis of Parkinson’s disease using modified grey wolf optimization. Cognitive Systems Research, 54, 100–115. https://doi.org/10.1016/j.cogsys.2018.12.002
Smith, KM., Caplan, DN. 2018. Communication impairment in Parkinson’s disease: Impact of motor and cognitive symptoms on speech and language. Brain and Language, 185, 38–46. https://doi.org/10.1016/j.bandl.2018.08.002
Virameteekul, S., Revesz, T., Jaunmuktane, Z., Warner, TT., De Pablo-Fernández, E. 2023. Clinical diagnostic accuracy of Parkinson’s disease: Where do we stand? Movement Disorders: Official Journal of the Movement Disorder Society, 38(4), 558–566. https://doi.org/10.1002/mds.29317
Vu, TA., Ha, NTT., Duc, LM., Huy, H. Q., Dung, NV., Huong, PTV., Thanh, NT. 2023. A comparison of machine learning algorithms for Parkinson’s disease detection. 2023 12th International Conference on Control, Automation and Information Sciences (ICCAIS).
Xie, X., Ho, JWK., Murphy, C., Kaiser, G., Xu, B., Chen, TY. 2011. Testing and validating machine learning classifiers by metamorphic testing. The Journal of Systems and Software, 84(4), 544–558. https://doi.org/10.1016/j.jss.2010.11.920

Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods

Yıl 2024, Cilt: 14 Sayı: 2, 134 - 147, 23.07.2024

Mustafa Alptekin Engin

Öz

Parkinson’s disease (PD), a neurological disorder, negatively affects the lives of patients and their caregivers. PD, which is very difficult to diagnose early by examining the clinical characteristics of the person, can be diagnosed using voice recordings. However, the inconsistent performance results of the models obtained from the evaluation of voice recordings through machine learning techniques limit the usability of these models to aid in diagnosing physicians. This study used a database of 195 voice data obtained from 31 individuals, 23 of whom have PD. The classification of the voices as healthy or patient was based on the 22 features in the database. The split ratios 90/10, 80/20, 70/30, 50/50 and 30/70 were used to select the training and test phase data, respectively. In addition, each split ratio was evaluated using 10-fold cross-validation, 5-fold cross-validation, holdout validation and resubstitution validation methods in the training phase, which is the initial process that will directly affect the other classification procedures. In addition, the classification process was performed using quadratic discriminant analysis, support vector machine, ensemble bagged tree, k-nearest neighbours and neural network classifiers. All procedures were repeated 10 times to ensure consistency of results and randomisation of split ratios. As a result, the k-nearest neighbours classifier with 80/20 splitting ratio and 10-fold cross-validation was determined to be the most successful among the compared methods with 95.64±3.21% accuracy. Therefore, it can be seen that much more successful results can be obtained by analysing only the effects of the existing parameters of the classifiers.

Anahtar Kelimeler

Classification, cross-validation, machine learning, repeated train/test splitting

Kaynakça

Abdulateef, SK., Ismael, AN., Salman, MD. 2023. Feature weighting for Parkinson’s identification using single hidden layer neural network. Computing, 225–230. https://doi.org/10.47839/ijc.22.2.3092
Bang, C., Bogdanovic, N., Deutsch, G., Marques, O. 2023. Machine learning for the diagnosis of Parkinson’s disease using speech analysis: a systematic review. International Journal of Speech Technology, 26(4), 991–998. https://doi.org/10.1007/s10772-023-10070-9
Bhavsar, K., Vakharia, V., Chaudhari, R., Vora, J., Pimenov, DY., Giasin, K. 2022. A comparative study to predict bearing degradation using discrete wavelet transform (DWT), tabular generative adversarial networks (TGAN) and machine learning models. Machines, 10(3), 176. https://doi.org/10.3390/machines10030176
Çağlar, MF., Çetişli, B., Toprak, İB. 2010. Automatic Recognition of Parkinson’s Disease from Sustained Phonation Tests Using ANN and Adaptive Neuro-Fuzzy Classifier. Mühendislik Bilimleri Ve Tasarım Dergisi, 1(2), 59–64.
Cortes, C., Vapnik, V. 1995. Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/bf00994018
Das, R. 2010. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Systems with Applications, 37(2), 1568–1572. https://doi.org/10.1016/j.eswa.2009.06.040
Duda, RO., Hart, PE., Stork, DG. 2022. Pattern Classification (3rd ed.). Standards Information Network.
Ekpezu, AO., Katsriku, F., Yaokumah, W., Wiafe, I. 2022. The use of machine learning algorithms in the classification of sound: A systematic review. International Journal of Service Science Management Engineering and Technology, 13(1), 1–28. https://doi.org/10.4018/ijssmet.298667
Ene, M. 2008. Neural network-based approach to discriminate healthy people from those with Parkinson’s disease. Annals of the University of Craiova-Mathematics and Computer Science Series, 35, 112–116.
Esmer, S., Uçar, MK., Çi̇l, İ., Bozkurt, MR. 2020. Parkinson Hastalığı Teşhisi İçin Makine Öğrenmesi Tabanlı Yeni Bir Yöntem. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 8(3), 1877–1893. https://doi.org/10.29130/dubited.688223
Fix, E., Hodges, JL. 1951. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties USAF School of Aviation Medicine.
Gupta, D., Julka, A., Jain, S., Aggarwal, T., Khanna, A., Arunkumar, N., de Albuquerque, VHC. 2018. Optimized cuttlefish algorithm for diagnosis of Parkinson’s disease. Cognitive Systems Research, 52, 36–48. https://doi.org/10.1016/j.cogsys.2018.06.006
Huang, Y., Chen, Q., Wang, Z., Wang, Y., Lian, A., Zhou, Q., Zhao, G., Xia, K., Tang, B., Li, B., Li, J. 2024. Risk factors associated with age at onset of Parkinson’s disease in the UK Biobank. NPJ Parkinson’s Disease, 10(1), 3. https://doi.org/10.1038/s41531-023-00623-9
Inzamam-Ul-Hossain, M., MacKinnon, L., Islam, MR. 2015. Parkinson disease detection using ensemble method in PASW benchmark. 2015 IEEE International Advance Computing Conference (IACC).
Islam, MA., Hasan Majumder, MZ., Hussein, MA., Hossain, KM., Miah, MS. 2024. A review of machine learning and deep learning algorithms for Parkinson’s disease detection using handwriting and voice datasets. Heliyon, 10(3), e25469. https://doi.org/10.1016/j.heliyon.2024.e25469
Iyer, A., Kemp, A., Rahmatallah, Y., Pillai, L., Glover, A., Prior, F., Larson-Prior, L., Virmani, T. 2023. A machine learning method to process voice samples for identification of Parkinson’s disease. Scientific Reports, 13(1), 20615. https://doi.org/10.1038/s41598-023-47568-w
James, G., Witten, D., Hastie, T., Tibshirani, R. 2013. An introduction to statistical learning: With applications in R (1st ed.). Springer.
Jana, DK., Bhunia, P., Adhikary, SD., Mishra, A. 2023. Analyzing of salient features and classification of wine type based on quality through various neural network and support vector machine classifiers. Results in Control and Optimization, 11(100219), 100219. https://doi.org/10.1016/j.rico.2023.100219
Little, M. 2007. Parkinsons [Data set]. UCI Machine Learning Repository. https://doi.org/10.24432/C59C74
Little, MA., McSharry, PE., Hunter, EJ., Spielman, J., Ramig, LO. 2009. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Transactions on Bio-Medical Engineering, 56(4), 1015–1022. https://doi.org/10.1109/tbme.2008.2005954
Luukka, P. 2011. Feature selection using fuzzy entropy measures with similarity classifier. Expert Systems with Applications, 38(4), 4600–4607. https://doi.org/10.1016/j.eswa.2010.09.133
Molera, LM. 2024. Machine learning Q&A: All about model validation. Mathworks.com. https://ch.mathworks.com/campaigns/offers/next/all-about-model-validation.html
Nareklishvili, M., Geitle, M. 2024. Deep ensemble transformers for dimensionality reduction. IEEE Transactions on Neural Networks and Learning Systems, 1–12. https://doi.org/10.1109/tnnls.2024.3357621
Orozco-Arroyave, JR., Vdsquez-Correa, JC., Honig, F., Arias-Londono, JD., Vargas-Bonilla, JF., Skodda, S., Rusz, J., Noth, E. 2016. Towards an automatic monitoring of the neurological state of Parkinson’s patients from speech. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Rusz, J., Cmejla, R., Ruzickova, H., Ruzicka, E. 2011. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. The Journal of the Acoustical Society of America, 129(1), 350–367. https://doi.org/10.1121/1.3514381
Senturk, ZK. 2020. Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses, 138(109603), 109603. https://doi.org/10.1016/j.mehy.2020.109603
Sharma, P., Sundaram, S., Sharma, M., Sharma, A., Gupta, D. 2019. Diagnosis of Parkinson’s disease using modified grey wolf optimization. Cognitive Systems Research, 54, 100–115. https://doi.org/10.1016/j.cogsys.2018.12.002
Smith, KM., Caplan, DN. 2018. Communication impairment in Parkinson’s disease: Impact of motor and cognitive symptoms on speech and language. Brain and Language, 185, 38–46. https://doi.org/10.1016/j.bandl.2018.08.002
Virameteekul, S., Revesz, T., Jaunmuktane, Z., Warner, TT., De Pablo-Fernández, E. 2023. Clinical diagnostic accuracy of Parkinson’s disease: Where do we stand? Movement Disorders: Official Journal of the Movement Disorder Society, 38(4), 558–566. https://doi.org/10.1002/mds.29317
Vu, TA., Ha, NTT., Duc, LM., Huy, H. Q., Dung, NV., Huong, PTV., Thanh, NT. 2023. A comparison of machine learning algorithms for Parkinson’s disease detection. 2023 12th International Conference on Control, Automation and Information Sciences (ICCAIS).
Xie, X., Ho, JWK., Murphy, C., Kaiser, G., Xu, B., Chen, TY. 2011. Testing and validating machine learning classifiers by metamorphic testing. The Journal of Systems and Software, 84(4), 544–558. https://doi.org/10.1016/j.jss.2010.11.920

Toplam 31 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Biyomedikal Tanı, Biyomekanik Mühendisliği
Bölüm	Research Article
Yazarlar	Mustafa Alptekin Engin 0000-0003-3399-9343
Yayımlanma Tarihi	23 Temmuz 2024
Gönderilme Tarihi	15 Mayıs 2024
Kabul Tarihi	31 Mayıs 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 14 Sayı: 2

Kaynak Göster

APA	Engin, M. A. (2024). Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods. Karaelmas Fen Ve Mühendislik Dergisi, 14(2), 134-147. https://doi.org/10.7212/karaelmasfen.1484222
AMA	Engin MA. Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods. Karaelmas Fen ve Mühendislik Dergisi. Temmuz 2024;14(2):134-147. doi:10.7212/karaelmasfen.1484222
Chicago	Engin, Mustafa Alptekin. “Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods”. Karaelmas Fen Ve Mühendislik Dergisi 14, sy. 2 (Temmuz 2024): 134-47. https://doi.org/10.7212/karaelmasfen.1484222.
EndNote	Engin MA (01 Temmuz 2024) Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods. Karaelmas Fen ve Mühendislik Dergisi 14 2 134–147.
IEEE	M. A. Engin, “Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods”, Karaelmas Fen ve Mühendislik Dergisi, c. 14, sy. 2, ss. 134–147, 2024, doi: 10.7212/karaelmasfen.1484222.
ISNAD	Engin, Mustafa Alptekin. “Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods”. Karaelmas Fen ve Mühendislik Dergisi 14/2 (Temmuz 2024), 134-147. https://doi.org/10.7212/karaelmasfen.1484222.
JAMA	Engin MA. Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods. Karaelmas Fen ve Mühendislik Dergisi. 2024;14:134–147.
MLA	Engin, Mustafa Alptekin. “Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods”. Karaelmas Fen Ve Mühendislik Dergisi, c. 14, sy. 2, 2024, ss. 134-47, doi:10.7212/karaelmasfen.1484222.
Vancouver	Engin MA. Parkinson’s Disease Detection Via Machine Learning Using Data Splitting and Validation Methods. Karaelmas Fen ve Mühendislik Dergisi. 2024;14(2):134-47.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin