DocumentCode :
1690203
Title :
Voice activity detection using convolutive non-negative sparse coding
Author :
Peng Teng ; Yunde Jia
Author_Institution :
Sch. of Comput., Beijing Inst. of Technol., Beijing, China
fYear :
2013
Firstpage :
7373
Lastpage :
7377
Abstract :
This paper presents a voice activity detection (VAD) approach using convolutive non-negative sparse coding (CNSC) to improve the detection performance in low signal-to-noise (SNR) conditions. Our idea is to use noise-robust feature for speech signal detection while noise is reduced away. We first use magnitude spectrum as the non-negative and additive low-level representation of audio signals, and learn a speech dictionary from clean speech as well as a noise dictionary from noise samples. Then, the two dictionaries are concatenated to form a global dictionary, and an audio signal is decomposed into coefficient vectors using CNSC on the global dictionary. Only coefficients corresponding to the bases from the speech dictionary are taken as the features for the signal. At last, the activity labels is given by decoding a conditional random field (CRF) which is constructed to model the context of an audio signal for VAD. Experiments demonstrate that our VAD approach has an excellent performance in low SNR conditions.
Keywords :
acoustic signal processing; array signal processing; error statistics; maximum likelihood estimation; microphones; regression analysis; speech recognition; MMI beamforming; WER reduction; acoustic models; active speaker; adaptation techniques; beamformer; conventional speaker adaptation methods; delay-and-sum beamforming; distant microphones; feature-space adaptation method; interfering speakers; joint constrained maximum likelihood regression; minimum mutual information beamforming; multiple speakers; overlapping speech recognition; putative signal; recognition performance; single-speaker scenarios; speech feature vectors; speech separation challenge data; word error rate reduction; Dictionaries; Encoding; Noise robustness; Signal to noise ratio; Speech; Vectors; conditional random fields; convolutive nonnegative sparse coding; voice activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639095
Filename :
6639095
Link To Document :
بازگشت