DocumentCode :
2974911
Title :
Multi-view learning of acoustic features for speaker recognition
Author :
Livescu, Karen ; Stoehr, Mark
Author_Institution :
TTI-Chicago, Chicago, IL, USA
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
82
Lastpage :
86
Abstract :
We consider learning acoustic feature transformations using an additional view of the data, in this case video of the speaker´s face. Specifically, we consider a scenario in which clean audio and video is available at training time, while at test time only noisy audio is available. We use canonical correlation analysis (CCA) to learn linear projections of the acoustic observations that have maximum correlation with the video frames. We provide an initial demonstration of the approach on a speaker recognition task using data from the VidTIMIT corpus. The projected features, in combination with baseline MFCCs, outperform the baseline recognizer in noisy conditions. The techniques we present are quite general, although here we apply them to the case of a specific speaker recognition task. This is the first work of which we are aware in which multiple views are used to learn an acoustic feature projection at training time, while using only the acoustics at test time.
Keywords :
speaker recognition; acoustic feature projection; acoustic feature transformations; canonical correlation analysis; multi-view learning; multiple views; speaker recognition; Acoustic noise; Acoustic testing; Automatic speech recognition; Feature extraction; Focusing; Linear discriminant analysis; Loudspeakers; Principal component analysis; Speaker recognition; Video recording;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373462
Filename :
5373462
Link To Document :
بازگشت