A multimodal approach to extract optimized audio features for speaker detection

Author

Besson, Patricia ; Kunt, Murat ; Butz, Torsten ; Thiran, Jean-Philippe

Author_Institution

Signal Process. Inst. (ITS), Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland

fYear

2005

fDate

4-8 Sept. 2005

Firstpage

Lastpage

Abstract

We present a method that exploits the information theoretic framework described in [1] to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the active speaker among different candidates. The results show that our method is able to exploit the shared speech information contained in audio and video signals to recover their common source.

Keywords

audio signal processing; feature extraction; image recognition; speaker recognition; video signal processing; active speaker detection; information theoretic framework; multimodal approach; optimized audio feature extraction; shared speech information; speaker detection; video features; video signals; Data mining; Feature extraction; Markov processes; Mouth; Mutual information; Optimization; Speech;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2005 13th European

Conference_Location

Antalya

Print_ISBN

978-160-4238-21-1

Type

conf

Filename

7078419

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=698825