Title :
Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise in Multimedia Communications
Author :
Syed, Waheeduddin Q. ; Wu, Hsiao-Chun
Author_Institution :
Louisiana State Univ., Baton Rouge
Abstract :
The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this paper, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and- spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple existing representative schemes, namely the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noise, namely white noise (stationary), babble noise (nonstationary) and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating curves (ROCs). Finally, we demonstrate one of the major applications, namely speech waveform compression, associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency.
Keywords :
Fourier analysis; data compression; multimedia communication; speech coding; white noise; Fourier spectral features; Mel-energy features; babble noise; multimedia communications; nonstationary noise; receiver operating curves; robust adaptive voice activity detection; signal-to-noise ratio; speech waveform compression; vehicular noise; white noise; Adaptive signal detection; Algorithm design and analysis; Detectors; Heuristic algorithms; Multimedia communication; Noise robustness; Signal to noise ratio; Speech enhancement; White noise; Working environment noise;
Conference_Titel :
Global Telecommunications Conference, 2007. GLOBECOM '07. IEEE
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-1042-2
Electronic_ISBN :
978-1-4244-1043-9
DOI :
10.1109/GLOCOM.2007.586