• DocumentCode
    3627812
  • Title

    Intersession variability compensation for language detection

  • Author

    Xi Zhou;Jiri Navratil;Jason W. Pelecanos;Ganesh N. Ramaswamy;Thomas S. Huang

  • Author_Institution
    Dept. of ECE, University of Illinois at Urbana-Champaign (UIUC), 61801, USA
  • fYear
    2008
  • Firstpage
    4157
  • Lastpage
    4160
  • Abstract
    Gaussian mixture models (GMM) have become one of the standard acoustic approaches for Language Detection. These models are typically incorporated to produce a log-likelihood ratio (LLR) verification statistic. In this framework, the intersession variability within each language becomes an adverse factor degrading the accuracy. To address this problem, we formulate the LLR as a function of the GMM parameters concatenated into normalized mean supervectors, and estimate the distribution of each language in this (high dimensional) supervector space. The goal is to de-emphasize the directions with the largest intersession variability. We compare this method with two other popular intersession variability compensation methods known as Nuisance Attribute Projection (NAP) and Within-Class Covariance Normalization (WCCN). Experiments on the NIST LRE 2003 and NIST LRE 2005 speech corpora show that the presented technique reduces the error by 50% relative to the baseline, and performs competitively with the NAP and WCCN approaches. Fusion results with a phonotactic component are also presented.
  • Keywords
    "Support vector machines","NIST","Concatenated codes","Kernel","Acoustic signal detection","Speech","Testing","Databases","Speaker recognition","Support vector machine classification"
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    2379-190X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518570
  • Filename
    4518570