DocumentCode :
730734
Title :
Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions
Author :
Weizhong Zhu ; Sadjadi, Seyed Omid ; Pelecanos, Jason W.
Author_Institution :
Watson Group, IBM Res., Yorktown Heights, NY, USA
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4684
Lastpage :
4688
Abstract :
Many state-of-the-art speaker recognition engines use i-vectors to represent variable-length acoustic signals in a fixed low-dimensional total variability subspace. While such systems perform well under seen channel conditions, their performance greatly degrades under unseen channel scenarios. Accordingly, rapid adaptation of i-vector systems to unseen conditions has recently attracted significant research effort from the community. To mitigate this mismatch, in this paper we propose nearest neighbor based i-vector mean normalization (NN-IMN) and i-vector smoothing (IS) for unsupervised adaptation to unseen channel conditions within a state-of-the-art i-vector/PLDA speaker verification framework. A major advantage of the approach is its ability to handle multiple unseen channels without explicit retraining or clustering. Our observations on the DARPA Robust Automatic Transcription of Speech (RATS) speaker recognition task suggest that part of the distortion caused by an unseen channel may be modeled as an offset in the i-vector space. Hence, the proposed nearest neighbor based normalization technique is formulated to compensate for such a shift. Experimental results with the NN based normalized i-vectors indicate that, on average, we can recover 46% of the total performance degradation due to unseen channel conditions.
Keywords :
acoustic signal processing; distortion; probability; smoothing methods; speaker recognition; vectors; DARPA; NN based normalized i-vectors; NN-IMN; RATS speaker recognition task; i-vector smoothing; i-vector space; i-vector systems; i-vector-PLDA speaker verification; low-dimensional total variability subspace; nearest neighbor based i-vector normalization; nearest neighbor based normalization technique; probabilistic linear discriminant analysis; robust automatic transcription of speech; robust speaker recognition; speaker recognition engines; variable-length acoustic signals; Degradation; RNA; Training; PLDA; i-vector; nearest neighbor; speaker recognition; unsupervised adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178859
Filename :
7178859
Link To Document :
بازگشت