DocumentCode :
3358687
Title :
Audio-visual speaker identification with multi-view distance metric learning
Author :
Zheng, Haomian ; Wang, Meng ; Li, Zhu
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China
fYear :
2010
fDate :
26-29 Sept. 2010
Firstpage :
4561
Lastpage :
4564
Abstract :
Both audio and visual information can be useful for speaker identification in videos. This paper proposes an audio-visual speaker identification approach that benefits from a multi-view distance metric learning method. Our metric learning scheme not only builds distance measures based on the label information of training data but also the consistency of different views. In this way, better metrics can be learned in comparison with metric learning for each view individually. We conduct experiments on VidTIMIT dataset and empirical results have demonstrated the effectiveness of our approach over a set of existing methods. In addition, we also implement our method on a multi-view digit recognition task and encouraging results are also obtained.
Keywords :
learning (artificial intelligence); speaker recognition; audio information; multi view digit recognition; multi view distance metric learning; speaker identification; visual information; Conferences; Feature extraction; Learning systems; Machine learning; Measurement; Training; Visualization; Audio-Visual; Distance Metric Learning; Multi-view; Speaker Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing (ICIP), 2010 17th IEEE International Conference on
Conference_Location :
Hong Kong
ISSN :
1522-4880
Print_ISBN :
978-1-4244-7992-4
Electronic_ISBN :
1522-4880
Type :
conf
DOI :
10.1109/ICIP.2010.5653016
Filename :
5653016
Link To Document :
بازگشت