Research Article
BibTex RIS Cite
Year 2024, Volume: 12 Issue: 2, 208 - 214, 27.12.2024
https://doi.org/10.51354/mjen.1538494

Abstract

References

  • [1] Amruth, V., Lavanya, K., Manoj, N., Umme, H., and Deepika, M. B. (2020). A novel approach for stutter speech recognition and correction. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 8:544–547.
  • [2] Arjun, K. N., Karthik, S., Kamalnath, D., Chanda, P., and Tripath, S. (2020). Automatic correction of stutter in dysfluent speech. Procedia Computer Science, 171:1363–1370.
  • [3] Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems, pages 12449–12460.
  • [4] Bayerl, S., von Gudenberg, A. W., H¨onig, F., N¨oth, E., and Riedhammer, K. (2022a). KSoF: The kassel state of fluency dataset – a therapy centered dataset of stuttering. In Proceedings of the Language Resources and Evaluation Conference LREC, pages 1780–1787. European Language Resources Association.
  • [5] Bayerl, S. P., Gerczuk, M., Batliner, A., Bergler, C., Amiriparian, S., Schuller, B., N¨oth, E., and Riedhammer, K. (2023). Classification of stuttering – the compare challenge and beyond. Computer Speech & Language, 81.
  • [6] Bayerl, S. P., Wagner, D., N¨oth, E., Bocklet, T., and Riedhammer, K. (2022b). The influence of dataset partitioning on dysfluency detection systems. In Sojka, P., Hor´ak, A., Kopeˇcek, I., and Pala, K., editors, Text, Speech, and Dialogue, pages 423–436, Cham. Springer International Publishing.
  • [7] Bayerl, S. P., Wagner, D., N¨oth, E., and Riedhammer, K. (2022c). Detecting dysfluencies in stuttering therapy using wav2vec 2.0. In Interspeech 2022. ISCA.
  • [8] Dash, A., Subramani, N., Manjunath, T., Yaragarala, V., and Tripathi, S. (2018). Speech recognition and correction of a stuttered speech. In 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 1757–1760.
  • [9] Heeman, P., Lunsford, R., McMillin, A., and Yaruss, J. S. (2016). Using clinician annotations to improve automatic speech recognition of stuttered speech. In Interspeech, pages 2651–2655.
  • [10] Howell, P., Davis, S., and Bartrip, J. (2009). The UCLASS archive of stuttered speech. Journal of Speech, Language, and Hearing Research, 52:556–596.
  • [11] Howell, P. and Sackin, S. (1995). Automatic recognition of repetitions and prolongations in stuttered speech. In Proceedings of the First World Congress on Fluency Disorders, pages 372–374.
  • [12] Kourkounakis, T., Hajavi, A., and Etemad, A. (2020). Detecting multiple speech dysfluencies using a deep residual network with bidirectional long-short-term memory. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pages 6089–6093.
  • [13] Kourkounakis, T., Hajavi, A., and Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech dysfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:2986–2999.
  • [14] Lea, C., Mitra, V., Joshi, A., Kajarekar, S., and Bigham,J. P. (2021). Sep-28k: A dataset for stuttering event detection from podcasts with people who stutter. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pages 6798–6802.
  • [15] N¨oth, E., Niemann, H., Haderlein, T., Decher, M., Eysholdt, U., Rosanowski, F., and Wittenberg, T. (2000). Automatic stuttering recognition using hidden markov models. In INTERSPEECH.
  • [16] Ratner, N. B. and MacWhinney, B. (2018). Fluency bank: A new resource for fluency research and practice. Journal of Fluency Disorders, 56:69–80.
  • [17] Shakeel, A. S., Md, S., Fabrice, H., and Slim, O. (2021). Stutternet: Stuttering detection using time delay neural network. In 2021 29th European Signal Processing Conference (EUSIPCO), pages 426–430.
  • [18] Shakeel, A. S., Md, S., Fabrice, H., and Slim, O. (2022a). Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514:385–402.
  • [19] Shakeel, A. S., Md, S., Fabrice, H., and Slim, O. (2022b). Robust stuttering detection via multi-task and adversarial learning. In European Signal Processing Conference (EUSIPCO).

Detection and Identification of Stuttering Types Using Siamese Network

Year 2024, Volume: 12 Issue: 2, 208 - 214, 27.12.2024
https://doi.org/10.51354/mjen.1538494

Abstract

Stuttering is a complex speech disorder characterized by disruptions in the fluency of verbal expression, often leading to challenges in communication for those affected. Accurate identification and classification of stuttering types can greatly benefit persons who stutter (PWS), especially in an era where voice technologies are becoming increasingly ubiquitous and integrated into daily life. In this work, we adapt a simple yet effective Siamese network architecture, known for its capability to learn from paired speech segments, to extract novel features from audio speech data. Our approach leverages these features to enhance the detection and identification of stuttering events. For our experiments, we rely on a subset of the SEP-28k stuttering dataset, initially implementing a single-task model and gradually evolving it into a more sophisticated multi-task model. Our results demonstrate that transitioning the network from a single-task learner to a multi-task learner, coupled with the integration of auxiliary classification heads, significantly improves the identification of stuttering types, even with a relatively small dataset.

References

  • [1] Amruth, V., Lavanya, K., Manoj, N., Umme, H., and Deepika, M. B. (2020). A novel approach for stutter speech recognition and correction. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 8:544–547.
  • [2] Arjun, K. N., Karthik, S., Kamalnath, D., Chanda, P., and Tripath, S. (2020). Automatic correction of stutter in dysfluent speech. Procedia Computer Science, 171:1363–1370.
  • [3] Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems, pages 12449–12460.
  • [4] Bayerl, S., von Gudenberg, A. W., H¨onig, F., N¨oth, E., and Riedhammer, K. (2022a). KSoF: The kassel state of fluency dataset – a therapy centered dataset of stuttering. In Proceedings of the Language Resources and Evaluation Conference LREC, pages 1780–1787. European Language Resources Association.
  • [5] Bayerl, S. P., Gerczuk, M., Batliner, A., Bergler, C., Amiriparian, S., Schuller, B., N¨oth, E., and Riedhammer, K. (2023). Classification of stuttering – the compare challenge and beyond. Computer Speech & Language, 81.
  • [6] Bayerl, S. P., Wagner, D., N¨oth, E., Bocklet, T., and Riedhammer, K. (2022b). The influence of dataset partitioning on dysfluency detection systems. In Sojka, P., Hor´ak, A., Kopeˇcek, I., and Pala, K., editors, Text, Speech, and Dialogue, pages 423–436, Cham. Springer International Publishing.
  • [7] Bayerl, S. P., Wagner, D., N¨oth, E., and Riedhammer, K. (2022c). Detecting dysfluencies in stuttering therapy using wav2vec 2.0. In Interspeech 2022. ISCA.
  • [8] Dash, A., Subramani, N., Manjunath, T., Yaragarala, V., and Tripathi, S. (2018). Speech recognition and correction of a stuttered speech. In 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 1757–1760.
  • [9] Heeman, P., Lunsford, R., McMillin, A., and Yaruss, J. S. (2016). Using clinician annotations to improve automatic speech recognition of stuttered speech. In Interspeech, pages 2651–2655.
  • [10] Howell, P., Davis, S., and Bartrip, J. (2009). The UCLASS archive of stuttered speech. Journal of Speech, Language, and Hearing Research, 52:556–596.
  • [11] Howell, P. and Sackin, S. (1995). Automatic recognition of repetitions and prolongations in stuttered speech. In Proceedings of the First World Congress on Fluency Disorders, pages 372–374.
  • [12] Kourkounakis, T., Hajavi, A., and Etemad, A. (2020). Detecting multiple speech dysfluencies using a deep residual network with bidirectional long-short-term memory. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pages 6089–6093.
  • [13] Kourkounakis, T., Hajavi, A., and Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech dysfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:2986–2999.
  • [14] Lea, C., Mitra, V., Joshi, A., Kajarekar, S., and Bigham,J. P. (2021). Sep-28k: A dataset for stuttering event detection from podcasts with people who stutter. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pages 6798–6802.
  • [15] N¨oth, E., Niemann, H., Haderlein, T., Decher, M., Eysholdt, U., Rosanowski, F., and Wittenberg, T. (2000). Automatic stuttering recognition using hidden markov models. In INTERSPEECH.
  • [16] Ratner, N. B. and MacWhinney, B. (2018). Fluency bank: A new resource for fluency research and practice. Journal of Fluency Disorders, 56:69–80.
  • [17] Shakeel, A. S., Md, S., Fabrice, H., and Slim, O. (2021). Stutternet: Stuttering detection using time delay neural network. In 2021 29th European Signal Processing Conference (EUSIPCO), pages 426–430.
  • [18] Shakeel, A. S., Md, S., Fabrice, H., and Slim, O. (2022a). Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514:385–402.
  • [19] Shakeel, A. S., Md, S., Fabrice, H., and Slim, O. (2022b). Robust stuttering detection via multi-task and adversarial learning. In European Signal Processing Conference (EUSIPCO).
There are 19 citations in total.

Details

Primary Language English
Subjects Signal Processing
Journal Section Research Article
Authors

Venera Adanova 0000-0001-7247-0288

Maksat Atagoziev This is me 0000-0001-7799-7636

Publication Date December 27, 2024
Submission Date August 25, 2024
Acceptance Date December 4, 2024
Published in Issue Year 2024 Volume: 12 Issue: 2

Cite

APA Adanova, V., & Atagoziev, M. (2024). Detection and Identification of Stuttering Types Using Siamese Network. MANAS Journal of Engineering, 12(2), 208-214. https://doi.org/10.51354/mjen.1538494
AMA Adanova V, Atagoziev M. Detection and Identification of Stuttering Types Using Siamese Network. MJEN. December 2024;12(2):208-214. doi:10.51354/mjen.1538494
Chicago Adanova, Venera, and Maksat Atagoziev. “Detection and Identification of Stuttering Types Using Siamese Network”. MANAS Journal of Engineering 12, no. 2 (December 2024): 208-14. https://doi.org/10.51354/mjen.1538494.
EndNote Adanova V, Atagoziev M (December 1, 2024) Detection and Identification of Stuttering Types Using Siamese Network. MANAS Journal of Engineering 12 2 208–214.
IEEE V. Adanova and M. Atagoziev, “Detection and Identification of Stuttering Types Using Siamese Network”, MJEN, vol. 12, no. 2, pp. 208–214, 2024, doi: 10.51354/mjen.1538494.
ISNAD Adanova, Venera - Atagoziev, Maksat. “Detection and Identification of Stuttering Types Using Siamese Network”. MANAS Journal of Engineering 12/2 (December 2024), 208-214. https://doi.org/10.51354/mjen.1538494.
JAMA Adanova V, Atagoziev M. Detection and Identification of Stuttering Types Using Siamese Network. MJEN. 2024;12:208–214.
MLA Adanova, Venera and Maksat Atagoziev. “Detection and Identification of Stuttering Types Using Siamese Network”. MANAS Journal of Engineering, vol. 12, no. 2, 2024, pp. 208-14, doi:10.51354/mjen.1538494.
Vancouver Adanova V, Atagoziev M. Detection and Identification of Stuttering Types Using Siamese Network. MJEN. 2024;12(2):208-14.

Manas Journal of Engineering 

16155