Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows

Author

Vallet, Félicien ; Essid, Slim ; Carrive, Jean ; Richard, Gaël

Author_Institution

CNRS/LTCI, Telecom ParisTech, Paris, France

fYear

2010

fDate

26-29 Sept. 2010

Firstpage

1469

Lastpage

1472

Abstract

In this paper we propose a novel multimodal method for identifying unregistered speakers in a TV talk-show using a semi-supervised learning approach based on Support Vector Machines. Our study highlights the fact that specific visual features prove to be very efficient for this particular type of video content which is edited from multi-camera recordings. These visual features, motivated by prior knowledge on the approach followed by the TV director in choosing the appropriate shots, are found to bring a significant improvement in identification accuracy when used together with classic audio Mel-frequency cepstral coefficients (+8% compared to various baseline systems, in particular a standard audio only system).

Keywords

feature extraction; learning (artificial intelligence); multimedia systems; speaker recognition; support vector machines; television; video cameras; video recording; video signal processing; TV director; TV talk-show; audio Mel-frequency cepstral coefficient; multicamera recording; multimedia system; multimodal identification; semisupervised learning; support vector machine; unregistered speaker identification; video content; visual feature; Face; Feature extraction; Image color analysis; Robustness; Speech; Support vector machines; Visualization; image analysis; multimedia databases; multimedia systems; pattern classification;

fLanguage

English

Publisher

ieee

Conference_Titel

Image Processing (ICIP), 2010 17th IEEE International Conference on

Conference_Location

Hong Kong

ISSN

1522-4880

Print_ISBN

978-1-4244-7992-4

Electronic_ISBN

1522-4880

Type

conf

DOI

10.1109/ICIP.2010.5653393

Filename

5653393