Intersession variability compensation for language detection

Author

Xi Zhou;Jiri Navratil;Jason W. Pelecanos;Ganesh N. Ramaswamy;Thomas S. Huang

Author_Institution

Dept. of ECE, University of Illinois at Urbana-Champaign (UIUC), 61801, USA

fYear

2008

Firstpage

4157

Lastpage

4160

Abstract

Gaussian mixture models (GMM) have become one of the standard acoustic approaches for Language Detection. These models are typically incorporated to produce a log-likelihood ratio (LLR) verification statistic. In this framework, the intersession variability within each language becomes an adverse factor degrading the accuracy. To address this problem, we formulate the LLR as a function of the GMM parameters concatenated into normalized mean supervectors, and estimate the distribution of each language in this (high dimensional) supervector space. The goal is to de-emphasize the directions with the largest intersession variability. We compare this method with two other popular intersession variability compensation methods known as Nuisance Attribute Projection (NAP) and Within-Class Covariance Normalization (WCCN). Experiments on the NIST LRE 2003 and NIST LRE 2005 speech corpora show that the presented technique reduces the error by 50% relative to the baseline, and performs competitively with the NAP and WCCN approaches. Fusion results with a phonotactic component are also presented.

Keywords

"Support vector machines","NIST","Concatenated codes","Kernel","Acoustic signal detection","Speech","Testing","Databases","Speaker recognition","Support vector machine classification"

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

ISSN

1520-6149

Print_ISBN

978-1-4244-1483-3

Electronic_ISBN

2379-190X

Type

conf

DOI

10.1109/ICASSP.2008.4518570

Filename

4518570