DocumentCode :
2029927
Title :
Improved voice activity detection for speech recognition system
Author :
Chin, Siew Wen ; Seng, Kah Phooi ; Ang, Li-Minn ; Lim, King Hann
Author_Institution :
Sch. of Electr. & Electron. Eng., Univ. of Nottingham, Semenyih, Malaysia
fYear :
2010
fDate :
16-18 Dec. 2010
Firstpage :
518
Lastpage :
523
Abstract :
An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.
Keywords :
cepstral analysis; pattern classification; radial basis function networks; speech processing; speech recognition; wavelet transforms; ASR rate; binary classifier; mel frequency cepstral coefficient; neural network; radial basis function; short term signal energy; speech recognition; voice activity detection; wavelet transform; Artificial neural networks; Classification algorithms; Continuous wavelet transforms; Mel frequency cepstral coefficient; Signal to noise ratio; Speech; Speech recognition; continuous wavelet transform; mel frequency cepstral coefficient; radial basis function; voice activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Symposium (ICS), 2010 International
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-7639-8
Type :
conf
DOI :
10.1109/COMPSYM.2010.5685456
Filename :
5685456
Link To Document :
بازگشت