Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions

Author

Weizhong Zhu ; Sadjadi, Seyed Omid ; Pelecanos, Jason W.

Author_Institution

Watson Group, IBM Res., Yorktown Heights, NY, USA

fYear

2015

fDate

19-24 April 2015

Firstpage

4684

Lastpage

4688

Abstract

Many state-of-the-art speaker recognition engines use i-vectors to represent variable-length acoustic signals in a fixed low-dimensional total variability subspace. While such systems perform well under seen channel conditions, their performance greatly degrades under unseen channel scenarios. Accordingly, rapid adaptation of i-vector systems to unseen conditions has recently attracted significant research effort from the community. To mitigate this mismatch, in this paper we propose nearest neighbor based i-vector mean normalization (NN-IMN) and i-vector smoothing (IS) for unsupervised adaptation to unseen channel conditions within a state-of-the-art i-vector/PLDA speaker verification framework. A major advantage of the approach is its ability to handle multiple unseen channels without explicit retraining or clustering. Our observations on the DARPA Robust Automatic Transcription of Speech (RATS) speaker recognition task suggest that part of the distortion caused by an unseen channel may be modeled as an offset in the i-vector space. Hence, the proposed nearest neighbor based normalization technique is formulated to compensate for such a shift. Experimental results with the NN based normalized i-vectors indicate that, on average, we can recover 46% of the total performance degradation due to unseen channel conditions.

Keywords

acoustic signal processing; distortion; probability; smoothing methods; speaker recognition; vectors; DARPA; NN based normalized i-vectors; NN-IMN; RATS speaker recognition task; i-vector smoothing; i-vector space; i-vector systems; i-vector-PLDA speaker verification; low-dimensional total variability subspace; nearest neighbor based i-vector normalization; nearest neighbor based normalization technique; probabilistic linear discriminant analysis; robust automatic transcription of speech; robust speaker recognition; speaker recognition engines; variable-length acoustic signals; Degradation; RNA; Training; PLDA; i-vector; nearest neighbor; speaker recognition; unsupervised adaptation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178859

Filename

7178859