Examination of Energy Based Voice Activity Detection Algorithms for Noisy Speech Signals
Abstract
This paper examines the behavior of two different energy-based voice activity detector (VAD) algorithms for noisy input signals. The examined detectors use time-domain methods to find speech boundaries. Time-domain short time energy features and/or zero-crossing rate of speech signals are used to evaluate the performance of the methods. In the first stage of both algorithms, time-domain short-time energy (STE) features are calculated for each speech segment. Then energy ratios and threshold values are used to detect any voicing activity of speech signals. The decision threshold value is calculated by evaluating the average STE of an initial silence period. The effectiveness of the selected methods is tested for clean and noisy speech samples. The methods are tested using the noisy speech signals under different SNR levels. The results indicated that both methods achieve a reasonable accuracy as low as an SNR value nearly 0dB with a slowly decreasing performance. But, under 0dB SNR, both methods lose their effectiveness against noisy conditions
Keywords
References
- R. G. Bachu, S. Kopparthi, B. Adapa and B. D. Barkana (2010), Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy, January, 2010, Advanced Techniques in Computing Sciences and Software Engineering, pp 279-282, 2010; DOI 10.1007/978-90-481-3660-5_47
- K.Sakhnov, E.Verteletskaya and B. Simak (2009), Dynamical Energy-Based Speech/Silence Detector for Speech Enhancement Applications, Proceedings of the World Congress on Engineering 2009 Vol I, WCE 2009, July 1 - 3, London, U.K., ISBN: 978-988-17012-5-1 L. R. Rabiner ; M. R. Sambur (1975), An algorithm for determining the endpoints of isolated utterances, The Bell System Technical Journal ( Volume: 54 , Issue: 2 , Feb. 1975 ), (ISSN: 0005-8580), DOI: 10.1002/j.1538-7305.1975.tb02840.x, pp. 297 – 315,
- Prasad, V. (2002), Comparison of voice activity detection algorithms for VoIP, Proceedings - International Symposium on Computers and Communications, ·DOI: 10.1109/ISCC.2002.1021726, pp.62-65,
- Pollak, P., Sovka, P., Uhlir, J. (1993), Noise Suppression System for a Car, proc. of the Third European Conference on Speech, Communication and Technology – EUROSPEECH’93, (Berlin, Germany), p. 1 073–1 076, vol.5, Sept..
- A. M. Kondoz (1999), Digital Speech. New York: John Wiley and Sons,
- L. R. Rabiner and R. W. Schafer (2007), Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing. Boston: Now Publishers Inc.,
- P.Renevey, A.Drygajlo, (2001), Entropy based voice activity detection in very noisy conditions, in Proc. Eurospeech 2001, pp.1887-1890
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Authors
Publication Date
October 31, 2019
Submission Date
August 1, 2019
Acceptance Date
October 24, 2019
Published in Issue
Year 2019
Cited By
MATHEMATICAL MODEL OF THE SYSTEM OF ACTIVE PROTECTION AGAINST EAVESDROPPING OF SPEECH INFORMATION ON THE SCRAMBLER GENERATOR
EUREKA: Physics and Engineering
https://doi.org/10.21303/2461-4262.2020.001241Development of a mathematical model of scrambler-type speech-like interference generator for system of prevent speech information from leaking via acoustic and vibration channels
Technology audit and production reserves
https://doi.org/10.15587/2312-8372.2019.185133Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey
IEEE Access
https://doi.org/10.1109/ACCESS.2024.3426670Enhancing video salient object detection via SAM-based multimodal energy prompting
Pattern Analysis and Applications
https://doi.org/10.1007/s10044-025-01531-9