COUGH SOUND ANALYSIS WITH DEEP LEARNING: THE IMPACT OF DATA AUGMENTATION ON RESPIRATORY DISEASE CLASSIFICATION

Ayşen Özün Türkçetin; Turgay Koç; Sule Cilekar

Araştırma Makalesi

COUGH SOUND ANALYSIS WITH DEEP LEARNING: THE IMPACT OF DATA AUGMENTATION ON RESPIRATORY DISEASE CLASSIFICATION

Yıl 2025, Cilt: 13 Sayı: 3, 896 - 910, 30.09.2025

Ayşen Özün Türkçetin , Turgay Koç , Sule Cilekar

Öz

Respiratory diseases affect millions globally, necessitating efficient and early diagnostic tools to mitigate complications. This study proposes a robust and systematic approach for classifying asthma, COPD, pneumonia, and healthy conditions using cough sound analysis. Mel-frequency cepstral coefficients (MFCCs) were extracted and used to train both a deep learning model (CNN) and traditional classifiers (Random Forest, SVM) under limited and imbalanced data conditions. A major focus was on evaluating the impact of data augmentation and model choice on classification performance. Initial results showed that traditional models outperformed the CNN due to overfitting. However, with progressive augmentation up to 800 synthetic samples per class and the use of Dice Loss, the CNN model achieved substantial improvements, reaching 84% accuracy and a Macro F1 Score of 69%. These results highlight the critical role of data augmentation and tailored training strategies in enhancing the performance of deep learning models for audio-based biomedical classification tasks.

Anahtar Kelimeler

Cough Sound Analysis , Lung Diseases , Deep Learning Models , Data Augmentation , Convolution Neural Network , Cross Validation , Imbalanced Data

Etik Beyan

Afyonkarahisar Health Sciences University ethics committee approval for the data to be collected within the scope of the project was received with the reference number 2023/470, code 2011-KAEK-2, and the ethics committee reports are presented in the attachment.

Teşekkür

In this study, the dataset was collected from patients hospitalized in the Department of Chest Diseases, Afyonkarahisar Health Sciences University. This study is a part of Ayşen Özün Türkçetin's doctoral dissertation. We thank Afyonkarahisar Health Sciences University for her help during the ethics committee and dataset stages.

Kaynakça

Allamy, S., & Koerich, A. L. (2021). 1D CNN architectures for music genre classification. In 2021 IEEE symposium series on computational intelligence (SSCI) (pp. 01-07). IEEE.
Alqudah, A. M., & Moussavi, Z. (2025). A Review of Deep Learning for Biomedical Signals: Current Applications, Advancements, Future Prospects, Interpretation, and Challenges. Computers, Materials & Continua, 83(3), 3021-3047.
Balamurali, B. T., Hee, H. I., Kapoor, S., Teoh, O. H., Teng, S. S., Lee, K. P., ... & Chen, J. M. (2021). Deep neural network-based respiratory pathology classification using cough sounds. Sensors, 21(16), 5555.
Berrar, D. (2019). "Accuracy and Precision: Evaluating the Performance of Machine Learning Models." Data Science Journal, 18(3), 102-113.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Brown, C., Nissen, I., & Smith, R. (2021). Deep learning applications in biosignal analysis: A review of noninvasive diagnostics. Journal of Medical AI, 8(2), 112-130.
Celik, G. (2023). CovidCoughNet: A new method based on convolutional neural networks and deep feature extraction using pitch-shifting data augmentation for covid-19 detection from cough, breath, and voice signals. Computers in Biology and Medicine, 163, 107153.
Chakraborty, S., Ghosh, P., Bhattacharya, M., Dutta, S., Banerjee, A., & Sinha, R. (2021). An AI-based cough recognition and classification system using smartphone audio recordings for early diagnosis of chronic diseases. PLOS ONE, 16(11), e0259021. https://doi.org/10.1371/journal.pone.0259021.
Chicco, D., et al. (2020). "A Comprehensive Review on Performance Metrics for Classification Models." Journal of Machine Learning Research, 21(1), 1-45.
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B, 20(2), 215–242.
Dey, R., & Salem, F. M. (2017). Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS) (pp. 1597-1600). IEEE.
Farzad, A., Mashayekhi, H., & Hassanpour, H. (2019). A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Computing and Applications, 31, 2507-2521.
Graves, A., Jaitly, N., & Mohamed, A. R. (2013). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
Hochreiter, S. (1997). Long Short-term Memory. Neural Computation MIT-Press.
Johnson, M., et al. (2020). "Analysis of cough sound features for diagnosing respiratory conditions." Journal of Medical Acoustics, 12(3), 145-158.
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and Music Signal Analysis in Python. Proceedings of the 14th Python in Science Conference, 18–25.
Melek Manshouri, N. (2022). Identifying COVID-19 by using spectral analysis of cough recordings: a distinctive classification study. Cognitive neurodynamics, 16(1), 239-253.
Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 565–571). IEEE. https://doi.org/10.1109/3DV.2016.79.
Pahar, M., O'Connell, O., et al. (2021). COVID-19 detection in cough, breath and speech using deep transfer learning and bottleneck features. npj Digital Medicine, 4(1), 166.
Pal, A., & Sankarasubbu, M. (2021). Pay attention to the cough: Early diagnosis of COVID-19 using interpretable symptoms embeddings with cough sound signal processing. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (pp. 620-628).
Schuller, B., Batliner, A., Steidl, S., & O'Reilly, J. (2020). Data augmentation strategies for improving biosignal classification. IEEE Transactions on Biomedical Engineering, 67(5), 1450-1462.
Sharma, N., Krishnan, P., Kumar, R., Ramoji, S., Chetupalli, S. R., Ghosh, P. K., & Ganapathy, S. (2020). Coswara--a database of breathing, cough, and voice sounds for COVID-19 diagnosis. arXiv preprint arXiv:2005.10548.
Shehab, S. A., Mohammed, K. K., Darwish, A., & Hassanien, A. E. (2024). Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases. Soft Computing, 1-17.
Sheikh, K. A., Patel, B., Shah, R., & Shah, M. (2024). Deep learning-based multilabel classification of cough sounds for screening of respiratory diseases. PLOS ONE, 19(2), e0289317. https://doi.org/10.1371/journal.pone.0289317.
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1-48.
Smith, J., et al. (2019). "Cough sound analysis in chronic obstructive pulmonary disease." Respiratory Medicine, 75(5), 230-240.
Sørensen, T. J. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab, 5(4), 1–34.
Suma, K. V., Koppad, D., Kumar, P., Kantikar, N. A., & Ramesh, S. (2024). Multi-task Learning for Lung Sound and Lung Disease Classification. SN Computer Science, 6(1), 51.
Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., & Jorge Cardoso, M. (2017). Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3 (pp. 240-248). Springer International Publishing.
Vodnala, N., Yarlagadda, P. S., Ch, M., & Sailaja, K. (2024). Novel Deep Learning Approaches to Differentiate Asthma and COPD Based on Cough Sounds. In 2024 Parul International Conference on Engineering and Technology (PICET) (pp. 1-4). IEEE.
World Health Organization. (2020). Global burden of respiratory diseases and diagnostic challenges. WHO Reports, 15(3), 45-60.
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., & Xu, B. (2016). Attention-based bidirectional long short-term memory networks for relation classification. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 207-212.

DERİN ÖĞRENME İLE ÖKSÜRÜK SESİ ANALİZİ: VERİ ARTIRIMININ SOLUNUM YOLU HASTALIKLARI SINIFLANDIRMASI ÜZERİNDEKİ ETKİSİ

Yıl 2025, Cilt: 13 Sayı: 3, 896 - 910, 30.09.2025

Ayşen Özün Türkçetin , Turgay Koç , Sule Cilekar

Öz

Solunum yolu hastalıkları küresel olarak milyonlarca kişiyi etkileyerek komplikasyonları azaltmak için etkili ve erken tanı araçlarının gerekliliğini ortaya koymaktadır. Bu çalışma, öksürük sesi analizini kullanarak astım, KOAH, zatürre ve sağlıklı durumları sınıflandırmak için sağlam ve sistematik bir yaklaşım önermektedir. Mel-frekans cepstral katsayıları (MFCC'ler) çıkarılarak ve sınırlı olan dengesiz veri koşulları altında hem derin öğrenme modelini (CNN) hem de geleneksel sınıflandırıcıları (Rastgele Orman, SVM) eğitmek için kullanılmıştır. Çalışmanın başlıca odak noktası, veri artırmanın ve model seçiminin sınıflandırma performansı üzerindeki etkisini değerlendirmektir. İlk sonuçlar, aşırı uyum nedeniyle geleneksel modellerin CNN'den daha iyi performans gösterdiğini göstermiştir. Ancak, sınıf başına 800 sentetik örneğe kadar kademeli artırma ve Dice Loss kullanımıyla CNN modeli önemli iyileştirmeler elde ederek %84 doğruluk ve %69'luk bir Makro F1 Puanı elde edildi. Bu sonuçlar, ses tabanlı biyomedikal sınıflandırma görevleri için derin öğrenme modellerinin performansını artırmada veri artırmanın ve özel eğitim stratejilerinin kritik rolünü vurgulamaktadır.

Anahtar Kelimeler

Öksürük Sesi Analizi , Akciğer Hastalıkları , Derin Öğrenme Modelleri , Veri Arttırma , Evrişimli Sinir Ağı , Çapraz Doğrulama , Dengesiz Veri

Etik Beyan

Proje kapsamında toplanacak olan veriler için Afyonkarahisar Sağlık Bilimleri Üniversitesi etik kurul onayı 2011-KAEK-2 kodu 2023/470 sayılı referans no ile alınmış olup, etik kurul raporları ekte sunulmuştur.

Teşekkür

Bu çalışmada veri seti Afyonkarahisar Sağlık Bilimleri Üniversitesi Göğüs Hastalıkları Anabilim Dalı'nda yatan hastalardan toplanmıştır. Bu çalışma Ayşen Özün Türkçetin'in doktora tezinin bir parçasıdır. Etik komite ve veri seti aşamalarında yardımları için Afyonkarahisar Sağlık Bilimleri Üniversitesi'ne teşekkür ederiz.

Kaynakça

Allamy, S., & Koerich, A. L. (2021). 1D CNN architectures for music genre classification. In 2021 IEEE symposium series on computational intelligence (SSCI) (pp. 01-07). IEEE.
Alqudah, A. M., & Moussavi, Z. (2025). A Review of Deep Learning for Biomedical Signals: Current Applications, Advancements, Future Prospects, Interpretation, and Challenges. Computers, Materials & Continua, 83(3), 3021-3047.
Balamurali, B. T., Hee, H. I., Kapoor, S., Teoh, O. H., Teng, S. S., Lee, K. P., ... & Chen, J. M. (2021). Deep neural network-based respiratory pathology classification using cough sounds. Sensors, 21(16), 5555.
Berrar, D. (2019). "Accuracy and Precision: Evaluating the Performance of Machine Learning Models." Data Science Journal, 18(3), 102-113.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Brown, C., Nissen, I., & Smith, R. (2021). Deep learning applications in biosignal analysis: A review of noninvasive diagnostics. Journal of Medical AI, 8(2), 112-130.
Celik, G. (2023). CovidCoughNet: A new method based on convolutional neural networks and deep feature extraction using pitch-shifting data augmentation for covid-19 detection from cough, breath, and voice signals. Computers in Biology and Medicine, 163, 107153.
Chakraborty, S., Ghosh, P., Bhattacharya, M., Dutta, S., Banerjee, A., & Sinha, R. (2021). An AI-based cough recognition and classification system using smartphone audio recordings for early diagnosis of chronic diseases. PLOS ONE, 16(11), e0259021. https://doi.org/10.1371/journal.pone.0259021.
Chicco, D., et al. (2020). "A Comprehensive Review on Performance Metrics for Classification Models." Journal of Machine Learning Research, 21(1), 1-45.
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B, 20(2), 215–242.
Dey, R., & Salem, F. M. (2017). Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS) (pp. 1597-1600). IEEE.
Farzad, A., Mashayekhi, H., & Hassanpour, H. (2019). A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Computing and Applications, 31, 2507-2521.
Graves, A., Jaitly, N., & Mohamed, A. R. (2013). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
Hochreiter, S. (1997). Long Short-term Memory. Neural Computation MIT-Press.
Johnson, M., et al. (2020). "Analysis of cough sound features for diagnosing respiratory conditions." Journal of Medical Acoustics, 12(3), 145-158.
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and Music Signal Analysis in Python. Proceedings of the 14th Python in Science Conference, 18–25.
Melek Manshouri, N. (2022). Identifying COVID-19 by using spectral analysis of cough recordings: a distinctive classification study. Cognitive neurodynamics, 16(1), 239-253.
Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 565–571). IEEE. https://doi.org/10.1109/3DV.2016.79.
Pahar, M., O'Connell, O., et al. (2021). COVID-19 detection in cough, breath and speech using deep transfer learning and bottleneck features. npj Digital Medicine, 4(1), 166.
Pal, A., & Sankarasubbu, M. (2021). Pay attention to the cough: Early diagnosis of COVID-19 using interpretable symptoms embeddings with cough sound signal processing. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (pp. 620-628).
Schuller, B., Batliner, A., Steidl, S., & O'Reilly, J. (2020). Data augmentation strategies for improving biosignal classification. IEEE Transactions on Biomedical Engineering, 67(5), 1450-1462.
Sharma, N., Krishnan, P., Kumar, R., Ramoji, S., Chetupalli, S. R., Ghosh, P. K., & Ganapathy, S. (2020). Coswara--a database of breathing, cough, and voice sounds for COVID-19 diagnosis. arXiv preprint arXiv:2005.10548.
Shehab, S. A., Mohammed, K. K., Darwish, A., & Hassanien, A. E. (2024). Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases. Soft Computing, 1-17.
Sheikh, K. A., Patel, B., Shah, R., & Shah, M. (2024). Deep learning-based multilabel classification of cough sounds for screening of respiratory diseases. PLOS ONE, 19(2), e0289317. https://doi.org/10.1371/journal.pone.0289317.
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1-48.
Smith, J., et al. (2019). "Cough sound analysis in chronic obstructive pulmonary disease." Respiratory Medicine, 75(5), 230-240.
Sørensen, T. J. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab, 5(4), 1–34.
Suma, K. V., Koppad, D., Kumar, P., Kantikar, N. A., & Ramesh, S. (2024). Multi-task Learning for Lung Sound and Lung Disease Classification. SN Computer Science, 6(1), 51.
Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., & Jorge Cardoso, M. (2017). Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3 (pp. 240-248). Springer International Publishing.
Vodnala, N., Yarlagadda, P. S., Ch, M., & Sailaja, K. (2024). Novel Deep Learning Approaches to Differentiate Asthma and COPD Based on Cough Sounds. In 2024 Parul International Conference on Engineering and Technology (PICET) (pp. 1-4). IEEE.
World Health Organization. (2020). Global burden of respiratory diseases and diagnostic challenges. WHO Reports, 15(3), 45-60.
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., & Xu, B. (2016). Attention-based bidirectional long short-term memory networks for relation classification. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 207-212.

Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Sinyal İşleme
Bölüm	Araştırma Makaleleri \ Research Articles
Yazarlar	Ayşen Özün Türkçetin 0000-0003-4784-2267 Turgay Koç 0000-0002-4846-7772 Sule Cilekar 0000-0001-8659-955X
Yayımlanma Tarihi	30 Eylül 2025
Gönderilme Tarihi	8 Nisan 2025
Kabul Tarihi	30 Temmuz 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 13 Sayı: 3

Kaynak Göster

APA	Türkçetin, A. Ö., Koç, T., & Cilekar, S. (2025). COUGH SOUND ANALYSIS WITH DEEP LEARNING: THE IMPACT OF DATA AUGMENTATION ON RESPIRATORY DISEASE CLASSIFICATION. Mühendislik Bilimleri ve Tasarım Dergisi, 13(3), 896-910.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin