Stuttering is a complex speech disorder characterized by disruptions in the fluency of verbal expression, often leading to challenges in communication for those affected. Accurate identification and classification of stuttering types can greatly benefit persons who stutter (PWS), especially in an era where voice technologies are becoming increasingly ubiquitous and integrated into daily life. In this work, we adapt a simple yet effective Siamese network architecture, known for its capability to learn from paired speech segments, to extract novel features from audio speech data. Our approach leverages these features to enhance the detection and identification of stuttering events. For our experiments, we rely on a subset of the SEP-28k stuttering dataset, initially implementing a single-task model and gradually evolving it into a more sophisticated multi-task model. Our results demonstrate that transitioning the network from a single-task learner to a multi-task learner, coupled with the integration of auxiliary classification heads, significantly improves the identification of stuttering types, even with a relatively small dataset.
Primary Language | English |
---|---|
Subjects | Signal Processing |
Journal Section | Research Article |
Authors | |
Publication Date | December 27, 2024 |
Submission Date | August 25, 2024 |
Acceptance Date | December 4, 2024 |
Published in Issue | Year 2024 Volume: 12 Issue: 2 |
Manas Journal of Engineering