• DocumentCode
    640552
  • Title

    On complementarity of state-of-the-art speaker recognition systems

  • Author

    Machlica, Lukas ; Zajic, Zbynek ; Muller, Lukas

  • Author_Institution
    Dept. of Cybern., Univ. of West Bohemia, Pilsen, Czech Republic
  • fYear
    2012
  • fDate
    12-15 Dec. 2012
  • Abstract
    In this paper recent methods used in the task of Speaker Recognition (SR) are reviewed and their complementarity is analysed. At first, methods based on Supervectors (SVs) related to Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs) used as a discriminative model are described along with the Nuisance Attribute Projection (NAP). NAP was proposed to suppress undesirable influences of high channel variabilities between several sessions of a speaker. Next, recent methods focusing on the extraction of so called i-vectors (low dimensional representations of GMM based SVs) are discussed. The space in which i-vectors lie is denoted the Total Variability Space (TVS) since it contains both between-speaker and session/channel variabilities. Once i-vectors have been extracted a Probabilistic Linear Discriminant Analysis (PLDA) model is trained in the TVS. In the training phase of PLDA the TVS is decomposed to a channel and a speaker subspace, hence each i-vector is supposed to be composed from a speaker identity component and a channel component. The complementarity of PLDA and SVM based modelling techniques is examined utilizing the linear logistic regression as a fusion tool used to combine the verification scores of individual systems leading to significant reductions in error rates of the SR system. The results are presented on the NIST SRE 2008 and NIST SRE 2010 corpora.
  • Keywords
    Gaussian processes; regression analysis; sensor fusion; speaker recognition; support vector machines; vectors; GMM; Gaussian mixture models; NIST SRE 2008 corpora; NIST SRE 2010 corpora; PLDA model; SR system; SVM; discriminative model; fusion tool; linear logistic regression; nuisance attribute projection; probabilistic linear discriminant analysis; speaker channel component; speaker channel variability; speaker identity component; speaker recognition systems; supervectors; support vector machines; total variability space; verification scores; Switches; NAP; PLDA; SVM; fusion; i-vector;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Information Technology (ISSPIT), 2012 IEEE International Symposium on
  • Conference_Location
    Ho Chi Minh City
  • Print_ISBN
    978-1-4673-5604-6
  • Type

    conf

  • DOI
    10.1109/ISSPIT.2012.6621280
  • Filename
    6621280