DocumentCode :
2918277
Title :
Voice activity detection using audio-visual information
Author :
Petsatodis, Theodoros ; Pnevmatikakis, Aristodemos ; Boukis, Christos
Author_Institution :
CTiF, Univ. of Aalborg, Aalborg, Greece
fYear :
2009
fDate :
5-7 July 2009
Firstpage :
1
Lastpage :
5
Abstract :
An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using hidden Markov models; their outcomes are fused using a post-decision scheme. The Mel-frequency cepstral coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.
Keywords :
AWGN; audio-visual systems; hidden Markov models; object detection; sensors; speech processing; Mel-frequency cepstral coefficient; additive white Gaussian noise; audio-visual information; hidden Markov model; post-decision scheme; sensor; unimodal detector; voice activity detection; Ambient intelligence; Audio recording; Detectors; Hidden Markov models; Information technology; Intelligent sensors; Mouth; Principal component analysis; Signal to noise ratio; Speech processing; Adaptive estimation; HiddenMarkov models; Speech processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Signal Processing, 2009 16th International Conference on
Conference_Location :
Santorini-Hellas
Print_ISBN :
978-1-4244-3297-4
Electronic_ISBN :
978-1-4244-3298-1
Type :
conf
DOI :
10.1109/ICDSP.2009.5201171
Filename :
5201171
Link To Document :
بازگشت