BibTex RIS Cite

COMPUTER BASED VOICE ANALYSIS ON MEDICAL DIAGNOSIS

Year 2012, Volume: 14 Issue: 1, 11 - 22, 01.01.2012

Abstract

The change in voice quality is affected by many of voice disorders that coming from pathogical
conditions of voice generation organs. The aim of this study is to help that the clinicians could be
diagnosed about voice disorders with non-invasive based analysis. In our work, amplitude
perturbation quotient, pitch period perturbation quotient, degree of unvoiceness, Teager Energy
Operators averages of wavelet transform coefficients, and higher-order statistics parameters have
formed the feature vectors. The voice segments belonging to different pathological or normal classes
were classified by backpropagation based multilayer perceptron networks. In backpropagation based
multilayer perceptron networks, resilient, scaled-conjugate gradient, and Brodyen-Fletcher-GoldfarbShanno
learning algorithms were used in training. According to the results of the simulation studies,
scaled-conjugate gradient algorithm gave the best results.

References

  • P. Boersma (1993): “Accurate Short-term Analysis of the Fundamental Frequency and the Hamonics-to-Noise Ratio of a Sampled Sound”, Proceedings of the Institute of Phonetic Sciences, Cilt 17, s. 97-110.
  • B. Boyanov, S. Hadjitodorov (1997): “Acoustic Analysis of Pathological Voices”, IEEE Engineering in Medicine and Biology, Cilt 16, s. 74-81.
  • E. J. Ciaccio, S. M. Dunn, M. Akay (1993): “Biosignal Pattern Recognition and Interpretation Systems”, IEEE Engineering in Medicine and Biology Magazine, Cilt 12, s. 89-97.
  • H. Demuth, M. Beale (2002): “Neural Network Toolbox for Use with MATLAB”.
  • J. L. Godino-Llorente, S. Aguilero-Navarro, P. Gomez-Vilda (2000): “Non-supervised Neural Net Applied to the Detection of Voice Impairment”, IEEE International Conference on Acoustics, Speech, and Signal Processing, s. 3594-3597.
  • S. Hadjitodorov, B. Boyanov, B. Teston (2000): “Laryngeal Pathology Detection by Means of Class-specific Neural Maps”, IEEE Transactions on Information Technology in Biomedicine, Cilt 4, s. 68-73.
  • J. H. L. Hansen, L. Gavidia-Ceballos, J. F. Kaiser (1998): “A Nonlinear Operator-based Speech Feature Analysis Method with Application to Vocal Fold Pathology Assessment”, IEEE Transactions on Biomedical Engineering, Cilt 45, s. 300-313.
  • H. K. Heris, B. S. Aghazadeh, M. Nikkhah-Bahrami (2009): “Optimal Feature Selection for the Assessment of Vocal Fold Disorders”, Computers in Biology and Medicine, Cilt 39, s. 860-868.
  • F. Jabloun, A. E. Çetin, E. Erzin (1999): “Teager Energy Based Feature Parameters for Speech Recognition in Car Noise”, Cilt 6, s. 259-261.
  • C. W. Jo, H: Kim (1999): “Classification of Pathological Voice into Normal/Benign/Malign State”, Proceedings of Eurospeech, s. 571-574.
  • M. A. Kılıç, E. Okur (2001): “CSL ve Dr. Spech ile Ölçülen Temel Frekans ve Pertürbasyon Değerlerinin Karşılaşırılması” KBB İhtisas Dergisi, Cilt 8, s. 152-157.
  • C. Manfredi, M. D’aniello, P. Bruscagliani, A. Ismaelli (2000): “A Comparative Analysis of Fundemenal Frequency Estimation Methods with Application to Pathological Voices”, Medical Engineering & Physics, Cilt 22, s. 135-147.
  • C. E. Martinez, R. L. Hugo (2000): “Acoustic Analysis of Speech for Detection of Laryngeal Pathologies”, Proceedings of the 22nd Annual EMBS International Conference, s. 2369- 2372.
  • M. F. Moller (1993): “A Scaled Conjugate Gradient Algorithm for Fast supervised Learning”, Neural Networks, Cilt 6, s. 525-533.
  • E. Nemer, R. Goubran, S. Mahmoud (2001): “Robust Voice Activity Detection Using Higher- order Statistics in the LPC Residual Domain”, IEEE Transactions on Speech and Audio Processing, Cilt 9, s. 217-231.
  • S. Osowski, T. H. Linh (2001): “ECG Beat Recogniton Using Fuzzy Hybrid Neural Network”, IEEE Transactions on Biomedical Engineering, Cilt 48, s. 1265-1271.
  • J. C. Principe, N. R. Euliano, W. C. Lefebvre (2000): “Neural and Adaptive Systems: Fundementals through Simulations”, John Wiley and Sons Inc.
  • M. M. Sondhi (1968): “New Methods of Pitch Extraction”, IEEE Transactions on Audio and Electroacoustics, Cilt 16, s. 262-266.
  • A. H. Tewfik, D. Sinha, P. Jorgensen (1992): “On the Optimal Choice of a Wavelet for Signal Representation”, IEEE Transactions on Information Theory, Cilt 38, s. 747-765.
  • Z. Tüfekci, J. N. Gowdy (2000): “Feature Extraction Using Discrete Wavelet Transfrom for Speech Recognition”, Proceedings of the IEEE Southeastcon, s. 116-123.
  • V. Uloza, A. Verikas, M. Bacauskiene, A. Gelzinis, R. Pribuisiene, M. Kaseta, V. Saferis (2011): ”Categorizing Normal and Pathological Voices: Automated and Perceptual Categorization”, Journal of Voice, Cilt 25, s. 700-708.
  • P. Veprek, M: S. Scordilis (2002): “Analysis, Enhancement and Evaluation of Five Pitch Determination Techniques”, Speech Communication, Cilt 37, s. 249-270.
  • X. Wang, J. Zhang, Y. Yan (2011): “Discrimination Between Pathological and Normal Voices Using GMM-SVM Approach”, Journal of Voice, Cilt 25, s. 38-43.
  • E. Yumoto, W. J. Gould (1982): “Harmonics-to-Noise Ratio as An Index of the Degree of Hoarness”, Journal Acoustical Society of America, Cilt 71, s. 1544-1550.

BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI

Year 2012, Volume: 14 Issue: 1, 11 - 22, 01.01.2012

Abstract

conditions of voice generation organs. The aim of this study is to help that the clinicians could be diagnosed about voice disorders with non-invasive based analysis. In our work, amplitude perturbation quotient, pitch period perturbation quotient, degree of unvoiceness, Teager Energy Operators averages of wavelet transform coefficients, and higher-order statistics parameters have formed the feature vectors. The voice segments belonging to different pathological or normal classes were classified by backpropagation based multilayer perceptron networks. In backpropagation based multilayer perceptron networks, resilient, scaled-conjugate gradient, and Brodyen-Fletcher-Goldfarb- Shanno learning algorithms were used in training. According to the results of the simulation studies, scaled-conjugate gradient algorithm gave the best results

References

  • P. Boersma (1993): “Accurate Short-term Analysis of the Fundamental Frequency and the Hamonics-to-Noise Ratio of a Sampled Sound”, Proceedings of the Institute of Phonetic Sciences, Cilt 17, s. 97-110.
  • B. Boyanov, S. Hadjitodorov (1997): “Acoustic Analysis of Pathological Voices”, IEEE Engineering in Medicine and Biology, Cilt 16, s. 74-81.
  • E. J. Ciaccio, S. M. Dunn, M. Akay (1993): “Biosignal Pattern Recognition and Interpretation Systems”, IEEE Engineering in Medicine and Biology Magazine, Cilt 12, s. 89-97.
  • H. Demuth, M. Beale (2002): “Neural Network Toolbox for Use with MATLAB”.
  • J. L. Godino-Llorente, S. Aguilero-Navarro, P. Gomez-Vilda (2000): “Non-supervised Neural Net Applied to the Detection of Voice Impairment”, IEEE International Conference on Acoustics, Speech, and Signal Processing, s. 3594-3597.
  • S. Hadjitodorov, B. Boyanov, B. Teston (2000): “Laryngeal Pathology Detection by Means of Class-specific Neural Maps”, IEEE Transactions on Information Technology in Biomedicine, Cilt 4, s. 68-73.
  • J. H. L. Hansen, L. Gavidia-Ceballos, J. F. Kaiser (1998): “A Nonlinear Operator-based Speech Feature Analysis Method with Application to Vocal Fold Pathology Assessment”, IEEE Transactions on Biomedical Engineering, Cilt 45, s. 300-313.
  • H. K. Heris, B. S. Aghazadeh, M. Nikkhah-Bahrami (2009): “Optimal Feature Selection for the Assessment of Vocal Fold Disorders”, Computers in Biology and Medicine, Cilt 39, s. 860-868.
  • F. Jabloun, A. E. Çetin, E. Erzin (1999): “Teager Energy Based Feature Parameters for Speech Recognition in Car Noise”, Cilt 6, s. 259-261.
  • C. W. Jo, H: Kim (1999): “Classification of Pathological Voice into Normal/Benign/Malign State”, Proceedings of Eurospeech, s. 571-574.
  • M. A. Kılıç, E. Okur (2001): “CSL ve Dr. Spech ile Ölçülen Temel Frekans ve Pertürbasyon Değerlerinin Karşılaşırılması” KBB İhtisas Dergisi, Cilt 8, s. 152-157.
  • C. Manfredi, M. D’aniello, P. Bruscagliani, A. Ismaelli (2000): “A Comparative Analysis of Fundemenal Frequency Estimation Methods with Application to Pathological Voices”, Medical Engineering & Physics, Cilt 22, s. 135-147.
  • C. E. Martinez, R. L. Hugo (2000): “Acoustic Analysis of Speech for Detection of Laryngeal Pathologies”, Proceedings of the 22nd Annual EMBS International Conference, s. 2369- 2372.
  • M. F. Moller (1993): “A Scaled Conjugate Gradient Algorithm for Fast supervised Learning”, Neural Networks, Cilt 6, s. 525-533.
  • E. Nemer, R. Goubran, S. Mahmoud (2001): “Robust Voice Activity Detection Using Higher- order Statistics in the LPC Residual Domain”, IEEE Transactions on Speech and Audio Processing, Cilt 9, s. 217-231.
  • S. Osowski, T. H. Linh (2001): “ECG Beat Recogniton Using Fuzzy Hybrid Neural Network”, IEEE Transactions on Biomedical Engineering, Cilt 48, s. 1265-1271.
  • J. C. Principe, N. R. Euliano, W. C. Lefebvre (2000): “Neural and Adaptive Systems: Fundementals through Simulations”, John Wiley and Sons Inc.
  • M. M. Sondhi (1968): “New Methods of Pitch Extraction”, IEEE Transactions on Audio and Electroacoustics, Cilt 16, s. 262-266.
  • A. H. Tewfik, D. Sinha, P. Jorgensen (1992): “On the Optimal Choice of a Wavelet for Signal Representation”, IEEE Transactions on Information Theory, Cilt 38, s. 747-765.
  • Z. Tüfekci, J. N. Gowdy (2000): “Feature Extraction Using Discrete Wavelet Transfrom for Speech Recognition”, Proceedings of the IEEE Southeastcon, s. 116-123.
  • V. Uloza, A. Verikas, M. Bacauskiene, A. Gelzinis, R. Pribuisiene, M. Kaseta, V. Saferis (2011): ”Categorizing Normal and Pathological Voices: Automated and Perceptual Categorization”, Journal of Voice, Cilt 25, s. 700-708.
  • P. Veprek, M: S. Scordilis (2002): “Analysis, Enhancement and Evaluation of Five Pitch Determination Techniques”, Speech Communication, Cilt 37, s. 249-270.
  • X. Wang, J. Zhang, Y. Yan (2011): “Discrimination Between Pathological and Normal Voices Using GMM-SVM Approach”, Journal of Voice, Cilt 25, s. 38-43.
  • E. Yumoto, W. J. Gould (1982): “Harmonics-to-Noise Ratio as An Index of the Degree of Hoarness”, Journal Acoustical Society of America, Cilt 71, s. 1544-1550.
There are 24 citations in total.

Details

Other ID JA56HG82JM
Journal Section Research Article
Authors

Erkan Zeki Engin This is me

Mehmet Engin This is me

Publication Date January 1, 2012
Published in Issue Year 2012 Volume: 14 Issue: 1

Cite

APA Engin, E. Z., & Engin, M. (2012). BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, 14(1), 11-22.
AMA Engin EZ, Engin M. BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI. DEUFMD. January 2012;14(1):11-22.
Chicago Engin, Erkan Zeki, and Mehmet Engin. “BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi 14, no. 1 (January 2012): 11-22.
EndNote Engin EZ, Engin M (January 1, 2012) BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 14 1 11–22.
IEEE E. Z. Engin and M. Engin, “BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI”, DEUFMD, vol. 14, no. 1, pp. 11–22, 2012.
ISNAD Engin, Erkan Zeki - Engin, Mehmet. “BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 14/1 (January 2012), 11-22.
JAMA Engin EZ, Engin M. BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI. DEUFMD. 2012;14:11–22.
MLA Engin, Erkan Zeki and Mehmet Engin. “BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, vol. 14, no. 1, 2012, pp. 11-22.
Vancouver Engin EZ, Engin M. BİLGİSAYAR TABANLI SES ANALİZİNİN TIBBİ TANIDA KULLANILMASI. DEUFMD. 2012;14(1):11-22.

Dokuz Eylül Üniversitesi, Mühendislik Fakültesi Dekanlığı Tınaztepe Yerleşkesi, Adatepe Mah. Doğuş Cad. No: 207-I / 35390 Buca-İZMİR.