DocumentCode :
3363945
Title :
Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows
Author :
Vallet, Félicien ; Essid, Slim ; Carrive, Jean ; Richard, Gaël
Author_Institution :
CNRS/LTCI, Telecom ParisTech, Paris, France
fYear :
2010
fDate :
26-29 Sept. 2010
Firstpage :
1469
Lastpage :
1472
Abstract :
In this paper we propose a novel multimodal method for identifying unregistered speakers in a TV talk-show using a semi-supervised learning approach based on Support Vector Machines. Our study highlights the fact that specific visual features prove to be very efficient for this particular type of video content which is edited from multi-camera recordings. These visual features, motivated by prior knowledge on the approach followed by the TV director in choosing the appropriate shots, are found to bring a significant improvement in identification accuracy when used together with classic audio Mel-frequency cepstral coefficients (+8% compared to various baseline systems, in particular a standard audio only system).
Keywords :
feature extraction; learning (artificial intelligence); multimedia systems; speaker recognition; support vector machines; television; video cameras; video recording; video signal processing; TV director; TV talk-show; audio Mel-frequency cepstral coefficient; multicamera recording; multimedia system; multimodal identification; semisupervised learning; support vector machine; unregistered speaker identification; video content; visual feature; Face; Feature extraction; Image color analysis; Robustness; Speech; Support vector machines; Visualization; image analysis; multimedia databases; multimedia systems; pattern classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing (ICIP), 2010 17th IEEE International Conference on
Conference_Location :
Hong Kong
ISSN :
1522-4880
Print_ISBN :
978-1-4244-7992-4
Electronic_ISBN :
1522-4880
Type :
conf
DOI :
10.1109/ICIP.2010.5653393
Filename :
5653393
Link To Document :
بازگشت