DocumentCode
681700
Title
Look who´s talking
Author
D´Arca, Eleonora ; Robertson, Neil M. ; Hopgood, James
Author_Institution
Heriot Watt Univ., Edinburgh, UK
fYear
2013
fDate
2-3 Dec. 2013
Firstpage
1
Lastpage
6
Abstract
This paper proposes a method to automatically detect and localise the dominant speaker in a conversation by using audio and video information. The idea is that gesturing means speaking, so we look for people hands or heads movements to infer a person is talking. In a normal conversational context with two or more people, we learn Mel-frequency cepstral coefficients (MFCC) and find how they correlate with the optical flow associated with moving pixel regions by canonical correlation analysis (CCA). In complex scenarios, this operation could be resulting in associating pixel regions to sounds which actually are not really correlated. Therefore, we also triangulate the information coming from the microphones to estimate the position of the actual audio source, narrowing down the visual space of search, hence reducing the probabilities of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real data the improvement in dominant speaker localization.
Keywords
cepstral analysis; correlation methods; gesture recognition; speaker recognition; CCA; MFCC; Mel-frequency cepstral coefficient; actual audio source; audio information; canonical correlation analysis; dominant speaker detection; dominant speaker localization; microphone; optical flow; people hands movement; people heads movement; video information; voice to pixel region association;
fLanguage
English
Publisher
iet
Conference_Titel
Intelligent Signal Processing Conference 2013 (ISP 2013), IET
Conference_Location
London
Electronic_ISBN
978-1-84919-774-8
Type
conf
DOI
10.1049/cp.2013.2075
Filename
6740524
Link To Document