• DocumentCode
    1326219
  • Title

    Multilevel and Session Variability Compensated Language Recognition: ATVS-UAM Systems at NIST LRE 2009

  • Author

    Gonzalez-Dominguez, Javier ; Lopez-Moreno, Ignacio ; Franco-Pedroso, Javier ; Ramos, Daniel ; Toledano, Doroteo Torre ; Gonzalez-Rodriguez, Joaquin

  • Author_Institution
    Escuela Politec. Super., Univ. Autonoma de Madrid, Madrid, Spain
  • Volume
    4
  • Issue
    6
  • fYear
    2010
  • Firstpage
    1084
  • Lastpage
    1093
  • Abstract
    This paper presents the systems submitted by the ATVS Biometric Recognition Group to the 2009 Language Recognition Evaluation (LRE´09), organized by NIST. New challenges included in this LRE edition can be summarized by three main differences with respect to past evaluations. First, the number of languages to be recognized expanded to 23 languages from 14 in 2007, and 7 in 2005. Second, the data variability has been increased by including telephone speech excerpts extracted from Voice of America (VOA) radio broadcasts through Internet in addition to conversational telephone speech (CTS). The third difference was the volume of data, involving in this evaluation up to 2 terabytes of speech data for development, which is an order of magnitude greater than past evaluations. LRE´09 thus required participants to develop robust systems able not only to successfully face the session variability problem but also to do it with reasonable computational resources. ATVS participation consisted of state-of-the-art acoustic and high-level systems focussing on these issues. Furthermore, the problem of finding a proper combination and calibration of the information obtained at different levels of the speech signal was widely explored in this submission. In this paper, two original contributions were developed. The first contribution was applying a session variability compensation scheme based on factor analysis (FA) within the statistics domain into a SVM-supervector (SVM-SV) approach. The second contribution was the employment of a novel back-end based on anchor models in order to fuse individual systems prior to one-versus-all calibration via logistic regression. Results both in development and evaluation corpora show the robustness and excellent performance of the submitted systems, exemplified by our system ranked second in the 30-second open-set condition, with remarkably scarce computational resources.
  • Keywords
    acoustic signal processing; biometrics (access control); natural language processing; regression analysis; speech recognition; support vector machines; ATVS biometric recognition group; ATVS participation; ATVS-UAM systems; CTS; Internet; LRE edition; NIST LRE 2009; SVM-SV approach; SVM-supervector approach; Voice of America radio broadcasts; conversational telephone speech; data variability; factor analysis; high-level systems; language recognition evaluation; logistic regression; session variability compensated language recognition; session variability compensation scheme; session variability problem; speech data; speech signal; state-of-the-art acoustic systems; statistics domain; telephone speech excerpts; Acoustics; Biometrics; Calibration; Computational modeling; Mathematical model; Speech; Speech recognition; Support vector machines; Anchor models; calibration; factor analysis (FA); language recognition; linear scoring; sufficient statistics;
  • fLanguage
    English
  • Journal_Title
    Selected Topics in Signal Processing, IEEE Journal of
  • Publisher
    ieee
  • ISSN
    1932-4553
  • Type

    jour

  • DOI
    10.1109/JSTSP.2010.2076071
  • Filename
    5575380