Title :
Audio-visual speaker identification with multi-view distance metric learning
Author :
Zheng, Haomian ; Wang, Meng ; Li, Zhu
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China
Abstract :
Both audio and visual information can be useful for speaker identification in videos. This paper proposes an audio-visual speaker identification approach that benefits from a multi-view distance metric learning method. Our metric learning scheme not only builds distance measures based on the label information of training data but also the consistency of different views. In this way, better metrics can be learned in comparison with metric learning for each view individually. We conduct experiments on VidTIMIT dataset and empirical results have demonstrated the effectiveness of our approach over a set of existing methods. In addition, we also implement our method on a multi-view digit recognition task and encouraging results are also obtained.
Keywords :
learning (artificial intelligence); speaker recognition; audio information; multi view digit recognition; multi view distance metric learning; speaker identification; visual information; Conferences; Feature extraction; Learning systems; Machine learning; Measurement; Training; Visualization; Audio-Visual; Distance Metric Learning; Multi-view; Speaker Recognition;
Conference_Titel :
Image Processing (ICIP), 2010 17th IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-7992-4
Electronic_ISBN :
1522-4880
DOI :
10.1109/ICIP.2010.5653016