Research Article
BibTex RIS Cite

Comparison of Different Dimensionality Reduction Methods in the Detection Parkinson's Disease

Year 2019, Issue: 17, 1164 - 1172, 31.12.2019
https://doi.org/10.31590/ejosat.655795

Abstract

Parkinson's Disease (PD) is a progressive neural disease that directly affects multiple motor and non-motor features of the individuals. PD individuals are often confronted with sound distortion in the first stage of the disease. In this case, the voice recordings of the people are used for the early detection. The features extracted from the sound recordings by signal processing methods are given as input to machine learning methods for the detection of the PD. In this study, the features extracted from the voice recordings of individuals were given as input to two different machine learning models for the detection of PD. The models were trained with the dataset obtained from the UCI Machine Learning repository. Two different dimensionality reduction methods were applied on the features in order to reduce the complexity of the trained models and to prevent the over-fitting. The first method, Principal Components Analysis (TBA), projects original feature space into a new subspace that has fewer dimensions than the original. In order to reduce feature dimensions, components with high variances in the new feature space are selected. In the second method, Recursive Feature Elimination (RFE), relevance scores are assigned to the features by using machine learning methods. In the first step, a model that uses the entire set of features is created and a relevance score is calculated for each feature. In the next stage, the model is rebuilt by neglecting the feature with the least relevance score and the relevance scores are recalculated. This process is continued until the desired number of features remains in the feature set. After dimensionality reduction process, Support Vector Machines (SVM) and Gradient Boosting Machines (GBM) classifiers were trained with selected features. Since the number of intances in the dataset was small, One Person Out Cross Validation (OPOCV) was used in classifier training. Due to having data imbalance problem, F-Measure and Matthews Correlation Coefficient (MCC) metrics were used along with accuracy in the performance evaluation. When all the experimental results were examined, it was found out that the highest classification success was achieved by using only 13 features. The GBM classifier was trained with 13 features selected by RFE method to obtain an accuracy of 0.881. Accuracy rate increased by about 2% according to the results obtained without feature selection. The same increase was also seen in the rate of MCC that shows the degree of the class distinguishability. While MCC rate obtained without dimensionality reduction was 0.62, the ratio increased to 0.67 when the feature selection was done with the RFE. PCA, which is the other dimensionality reduction method, did not increase the classification success compared to without selection, but achieved the same success rates with fewer features.

References

  • [1]Launer, L. J., Berger, K., Breteler, M. M., Dartigues, J. F., Baldereschi, M., Fratiglioni, L., ... & Hofman, A. (2000). Prevalence of Parkinson's disease in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in the Elderly Research Group. Neurology, 54(11 Suppl 5), S21-3.
  • [2]Jankovic, J. (2008). Parkinson’s disease: clinical features and diagnosis. Journal of neurology, neurosurgery & psychiatry, 79(4), 368-376.
  • [3]Tsanas, A., Little, M. A., McSharry, P. E., & Ramig, L. O. (2009). Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests. IEEE transactions on Biomedical Engineering, 57(4), 884-893.
  • [4]Sakar, B. E., Isenkul, M. E., Sakar, C. O., Sertbas, A., Gurgen, F., Delil, S., ... & Kursun, O. (2013). Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics, 17(4), 828-834.
  • [5]Gürüler, H. (2017). A novel diagnosis system for Parkinson’s disease using complex-valued artificial neural network with k-means clustering feature weighting method. Neural Computing and Applications, 28(7), 1657-1666.
  • [6]Peker, M. (2016). A decision support system to improve medical diagnosis using a combination of k-medoids clustering based attribute weighting and SVM. Journal of medical systems, 40(5), 116.
  • [7]Sakar, B. E., Serbes, G., & Sakar, C. O. (2017). Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson's disease. PloS one, 12(8), e0182428.
  • [8]Sharma, A., & Giri, R. N. (2014). Automatic recognition of Parkinson’s Disease via artificial neural network and support vector machine. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 4(3), 2278-3075.
  • [9]Shahbakhi, M., Far, D. T., & Tahami, E. (2014). Speech analysis for diagnosis of parkinson’s disease using genetic algorithm and support vector machine. Journal of Biomedical Science and Engineering, 7(4), 147-156.
  • [10] Kubota, K. J., Chen, J. A., & Little, M. A. (2016). Machine learning for large‐scale wearable sensor data in Parkinson's disease: Concepts, promises, pitfalls, and futures. Movement disorders, 31(9), 1314- 1326.
  • [11] Alemami, Y., & Almazaydeh, L. (2014). Detection of Parkinson disease through voice signal f eatures. The Journal of American Science, 10(10), 44-47.
  • [12]GÜNDÜZ, H., Cataltepe, Z., & Yaslan, Y. (2017). Stock daily return prediction using expanded f eatures and feature selection. Turkish Journal of Electrical Engineering & Computer Sciences, 25(6), 4829-4840.
  • [13]Sakar, C. O., Serbes, G., Gunduz, A., Tunc, H. C., Nizam, H., Sakar, B. E., ... & Apaydin, H. (2019). A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, 74, 255-263.
  • [14]Little, M. A., McSharry, P. E., Hunter, E. J., Spielman, J., & Ramig, L. O. (2008). Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE transactions on biomedical engineering, 56(4), 1015-1022.
  • [15]Tsanas, A., Little, M. A., McSharry, P. E., Spielman, J., & Ramig, L. O. (2012). Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE transactions on biomedical engineering, 59(5), 1264-1271.
  • [16] Parisi, L., RaviChandran, N., & Manaog, M. L. (2018). Feature-driven machine learning to improve early diagnosis of Parkinson's disease. Expert Systems with Applications, 110, 182-190.
  • [17]Oh, S. L., Hagiwara, Y., Raghavendra, U., Yuvaraj, R., Arunkumar, N., Murugappan, M., & Acharya, U. R. (2018). A deep learning approach for Parkinson’s disease diagnosis from EEG signals. Neural Computing and Applications, 1-7.
  • [18]Pereira, C. R., Weber, S. A., Hook, C., Rosa, G. H., & Papa, J. P. (2016, October). Deep learning-aided Parkinson's disease diagnosis from handwritten dynamics. In 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (pp. 340-346). IEEE.
  • [19]Eskofier, B. M., Lee, S. I., Daneault, J. F., Golabchi, F. N., Ferreira-Carvalho, G., Vergara-Diaz, G., ... & Bonato, P. (2016, August). Recent machine learning advancements in sensor-based mobility analysis: Deep learning for Parkinson's disease assessment. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)(pp. 655-658). IEEE.
  • [20]Yunusova, Y., Weismer, G., Westbury, J. R., & Lindstrom, M. J. (2008). Articulatory movements during vowels in speakers with dysarthria and healthy controls. Journal of Speech, Language, and Hearing Research.
  • [21]Selesnick, I. W. (2011). Wavelet transform with tunable Q-factor. IEEE transactions on signal processing, 59(8), 3560-3575.
  • [22]Gunduz, H., & Cataltepe, Z. (2015). Borsa Istanbul (BIST) daily prediction using financial news and balanced feature selection. Expert Systems with Applications, 42(22), 9001-9011.
  • [23]Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
  • [24]Yücelbaş, Ş., & Yücelbaş, C. (2019). Temel Bileşen Analizi Yöntemleri Kullanarak Parkinson Hastalığının Otomatik Teşhisi. Avrupa Bilim ve Teknoloji Dergisi, (16), 294-300.
  • [25]Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
  • [26]Yan, K., & Zhang, D. (2015). Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors and Actuators B: Chemical, 212, 353-363.
  • [27]Weston, J., & Guyon, I. (2012). U.S. Patent No. 8,095,483. Washington, DC: U.S. Patent and Trademark Office.
  • [28]Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293-300.
  • [29]Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In icml (Vol. 96, pp. 148-156).
  • [30]Schapire, R. E., & Freund, Y. (2013). Boosting: Foundations and algorithms. Kybernetes.
  • [31]Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.

Parkinson Hastalığı Tespitinde Farklı Boyutsallık İndirgeme Yöntemlerinin Karşılaştırılması

Year 2019, Issue: 17, 1164 - 1172, 31.12.2019
https://doi.org/10.31590/ejosat.655795

Abstract

Parkinson Hastalığı (PH), bireylerin çoklu motor ve motor olmayan özelliklerini doğrudan etkileyen ilerleyici bir sinir hastalığıdır. PH’nin ilk evresinde bireyler genellikle ses bozulmalarıyla karşı karşıya kalır. Bu durumda PH’nin erken tespitinde kişilerin ses kayıtlarından yararlanılır. Ses kayıtlarından sinyal işleme yöntemleriyle çıkarılan öznitelikler yapay öğrenme yöntemlerine girdi olarak verilerek bireylerin hastalığa sahip olup olmadığı tespit edilir. Bu çalışmada bireylerin ses kayıtlarından çıkarılan öznitelikler iki farklı yapay öğrenme yöntemine girdi olarak verilmiş ve bireyler Parkinson hastası veya sağlıklı olarak sınıflandırılmıştır. Oluşturulan modeller UCI Makine Öğrenmesi deposundan alınan veri kümesi ile eğitilmiştir. Hem eğitilen yapay öğrenme modellerinin karmaşıklığını azaltmak hem de modellerin aşırı öğrenmesini engellemek için öznitelikler üzerinde iki farklı boyutsallık indirgeme yöntemi uygulanmıştır. İlk yöntem olan Temel Bileşenler Analizi (TBA)’yle yeni bir öznitelik alt uzayı oluşturmak için öznitelik kümesi orijinal boyuttan daha az boyuta sahip olan yeni bir alt uzaya yansıtılır. Oluşturulan yeni öznitelik uzayında yüksek varyansa sahip bileşenler seçilirken; varyansı düşük bileşenler ihmal edilir. İkinci yöntem olan Özyinelemeli Öznitelik Eleme (ÖÖE)’de özniteliklere yapay öğrenme yöntemleri kullanılarak ilgililik puanları atanır. İlk aşamada tüm öznitelik kümesini kullanan bir model oluşturulur ve her öznitelik için bir ilgililik puanı hesaplanır. Sonraki aşamada en az ilgililik puanına sahip öznitelik ihmal edilerek model yeniden oluşturulur ve ilgililik puanları tekrar hesaplanır. Bu işlem öznitelik kümesinde istenilen sayıda öznitelik kalana kadar devam ettirilir. Kullanlan iki Boyutsallık indirgeme yöntemiyle öznitelik uzayının boyutları azaltılmış ve indirgenmiş öznitelik vektörleriyle Destek Vektör Makineleri (DVM) ve Gradyan Arttırıcı Makineler (GAM) sınıflandırıcıları eğitilmiştir. Elde edilen veri kümesinin örnek sayısı görece az olduğundan sınıflandırıcıların eğitiminde Bireyi Dışarda Bırakan Çapraz Doğrulama (BDBÇD) prosedürü kullanılmıştır. Veri kümesi aynı zamanda dengesiz sınıf dağılımına sahip olduğundan modellerin performans değerlendirmesinde doğruluk oranıyla birlikte F-ölçütü ve Matthews Korelasyon Katsayısı (MKK) ölçütleri kullanılmıştır. Alınan tüm deneysel sonuçlar irdelendiğinde en yüksek sınıflandırma başarısına sadece 13 öznitelik kullanılarak erişildiği görülmüştür. ÖÖE yöntemiyle seçilen 13 öznitelikle GAM sınıflandırıcısı eğitilerek 0,881 doğruluk oranı elde edilmiştir. Doğruluk oranı öznitelik seçimi yapılmadan elde edilen sonuçlara göre yaklaşık %2 oranında artmıştır. Aynı artış sınıfların ayırt edilebilirliğini gösteren MKK oranında da olmuştur. Boyutsallık indirgeme işlemi olmadan elde edilen MKK oranı 0,62 iken ÖÖE yöntemiyle öznitelik seçimi yapıldığında oran 0,67’ye yükselmiştir. Kullanılan diğer boyutsallık indirgeme yöntemi olan TBA ise öznitelik seçimsiz modellere göre sınıflandırma başarısı arttırmamasına rağmen, aynı başarı oranlarına daha az sayıda öznitelikle erişmiştir.

References

  • [1]Launer, L. J., Berger, K., Breteler, M. M., Dartigues, J. F., Baldereschi, M., Fratiglioni, L., ... & Hofman, A. (2000). Prevalence of Parkinson's disease in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in the Elderly Research Group. Neurology, 54(11 Suppl 5), S21-3.
  • [2]Jankovic, J. (2008). Parkinson’s disease: clinical features and diagnosis. Journal of neurology, neurosurgery & psychiatry, 79(4), 368-376.
  • [3]Tsanas, A., Little, M. A., McSharry, P. E., & Ramig, L. O. (2009). Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests. IEEE transactions on Biomedical Engineering, 57(4), 884-893.
  • [4]Sakar, B. E., Isenkul, M. E., Sakar, C. O., Sertbas, A., Gurgen, F., Delil, S., ... & Kursun, O. (2013). Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics, 17(4), 828-834.
  • [5]Gürüler, H. (2017). A novel diagnosis system for Parkinson’s disease using complex-valued artificial neural network with k-means clustering feature weighting method. Neural Computing and Applications, 28(7), 1657-1666.
  • [6]Peker, M. (2016). A decision support system to improve medical diagnosis using a combination of k-medoids clustering based attribute weighting and SVM. Journal of medical systems, 40(5), 116.
  • [7]Sakar, B. E., Serbes, G., & Sakar, C. O. (2017). Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson's disease. PloS one, 12(8), e0182428.
  • [8]Sharma, A., & Giri, R. N. (2014). Automatic recognition of Parkinson’s Disease via artificial neural network and support vector machine. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 4(3), 2278-3075.
  • [9]Shahbakhi, M., Far, D. T., & Tahami, E. (2014). Speech analysis for diagnosis of parkinson’s disease using genetic algorithm and support vector machine. Journal of Biomedical Science and Engineering, 7(4), 147-156.
  • [10] Kubota, K. J., Chen, J. A., & Little, M. A. (2016). Machine learning for large‐scale wearable sensor data in Parkinson's disease: Concepts, promises, pitfalls, and futures. Movement disorders, 31(9), 1314- 1326.
  • [11] Alemami, Y., & Almazaydeh, L. (2014). Detection of Parkinson disease through voice signal f eatures. The Journal of American Science, 10(10), 44-47.
  • [12]GÜNDÜZ, H., Cataltepe, Z., & Yaslan, Y. (2017). Stock daily return prediction using expanded f eatures and feature selection. Turkish Journal of Electrical Engineering & Computer Sciences, 25(6), 4829-4840.
  • [13]Sakar, C. O., Serbes, G., Gunduz, A., Tunc, H. C., Nizam, H., Sakar, B. E., ... & Apaydin, H. (2019). A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, 74, 255-263.
  • [14]Little, M. A., McSharry, P. E., Hunter, E. J., Spielman, J., & Ramig, L. O. (2008). Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE transactions on biomedical engineering, 56(4), 1015-1022.
  • [15]Tsanas, A., Little, M. A., McSharry, P. E., Spielman, J., & Ramig, L. O. (2012). Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE transactions on biomedical engineering, 59(5), 1264-1271.
  • [16] Parisi, L., RaviChandran, N., & Manaog, M. L. (2018). Feature-driven machine learning to improve early diagnosis of Parkinson's disease. Expert Systems with Applications, 110, 182-190.
  • [17]Oh, S. L., Hagiwara, Y., Raghavendra, U., Yuvaraj, R., Arunkumar, N., Murugappan, M., & Acharya, U. R. (2018). A deep learning approach for Parkinson’s disease diagnosis from EEG signals. Neural Computing and Applications, 1-7.
  • [18]Pereira, C. R., Weber, S. A., Hook, C., Rosa, G. H., & Papa, J. P. (2016, October). Deep learning-aided Parkinson's disease diagnosis from handwritten dynamics. In 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (pp. 340-346). IEEE.
  • [19]Eskofier, B. M., Lee, S. I., Daneault, J. F., Golabchi, F. N., Ferreira-Carvalho, G., Vergara-Diaz, G., ... & Bonato, P. (2016, August). Recent machine learning advancements in sensor-based mobility analysis: Deep learning for Parkinson's disease assessment. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)(pp. 655-658). IEEE.
  • [20]Yunusova, Y., Weismer, G., Westbury, J. R., & Lindstrom, M. J. (2008). Articulatory movements during vowels in speakers with dysarthria and healthy controls. Journal of Speech, Language, and Hearing Research.
  • [21]Selesnick, I. W. (2011). Wavelet transform with tunable Q-factor. IEEE transactions on signal processing, 59(8), 3560-3575.
  • [22]Gunduz, H., & Cataltepe, Z. (2015). Borsa Istanbul (BIST) daily prediction using financial news and balanced feature selection. Expert Systems with Applications, 42(22), 9001-9011.
  • [23]Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
  • [24]Yücelbaş, Ş., & Yücelbaş, C. (2019). Temel Bileşen Analizi Yöntemleri Kullanarak Parkinson Hastalığının Otomatik Teşhisi. Avrupa Bilim ve Teknoloji Dergisi, (16), 294-300.
  • [25]Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
  • [26]Yan, K., & Zhang, D. (2015). Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors and Actuators B: Chemical, 212, 353-363.
  • [27]Weston, J., & Guyon, I. (2012). U.S. Patent No. 8,095,483. Washington, DC: U.S. Patent and Trademark Office.
  • [28]Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural processing letters, 9(3), 293-300.
  • [29]Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In icml (Vol. 96, pp. 148-156).
  • [30]Schapire, R. E., & Freund, Y. (2013). Boosting: Foundations and algorithms. Kybernetes.
  • [31]Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
There are 31 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Hakan Gündüz 0000-0003-2152-5490

Publication Date December 31, 2019
Published in Issue Year 2019 Issue: 17

Cite

APA Gündüz, H. (2019). Parkinson Hastalığı Tespitinde Farklı Boyutsallık İndirgeme Yöntemlerinin Karşılaştırılması. Avrupa Bilim Ve Teknoloji Dergisi(17), 1164-1172. https://doi.org/10.31590/ejosat.655795