Title :
Computational Auditory Scene Analysis Based Voice Activity Detection
Author :
Ming Tu ; Xiang Xie ; Xingyu Na
Author_Institution :
Sch. of Inf. & Electron., Beijing Inst. of Technol., Beijing, China
Abstract :
Voice activity detection (VAD) is always important in many speech applications. In this paper, two VAD methods using novel features based on computational auditory scene analysis (CASA) are proposed. The first method is based on statistical model based VAD. Cochlea gram instead of discrete fourier transform coefficients is used as time-frequency representation to do statistical model based VAD. The second is a supervised method based on Gaussian Mixture Model. We extract gamma tone frequency cepstral coefficients (GFCC) from cochlea gram and use this feature to discriminate speech and noise in noisy signal. Gaussian mixture model is used to model GFCC of speech and noise. We evaluate the two methods both in the framework of multiple observation likelihood ratio test. The performances of the two methods are compared with several existing algorithms. The results demonstrate that CASA based features outperform several traditional features in the task of VAD, and the reasons of the superiority of the proposed two features are also investigated.
Keywords :
discrete Fourier transforms; signal detection; speech synthesis; statistical analysis; CASA; GFCC; Gaussian mixture model; VAD methods; cochleagram; computational auditory scene analysis; discrete fourier transform coefficients; gammatone frequency cepstral coefficients; multiple observation likelihood ratio test; speech applications; statistical model; supervised method; time-frequency representation; voice activity detection; Feature extraction; Mel frequency cepstral coefficient; Noise measurement; Robustness; Signal to noise ratio; Speech;
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
DOI :
10.1109/ICPR.2014.147