• DocumentCode
    178053
  • Title

    JFA-based front ends for speaker recognition

  • Author

    Kenny, P. ; Stafylakis, Themos ; Ouellet, Pierre ; Alam, Mohammad Jahangir

  • Author_Institution
    Centre de Rech. Inf. de Montreal (CRIM), Montreal, QC, Canada
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    1705
  • Lastpage
    1709
  • Abstract
    We discuss the limitations of the i-vector representation of speech segments in speaker recognition and explain how Joint Factor Analysis (JFA) can serve as an alternative feature extractor in a variety of ways. Building on the work of Zhao and Dong, we implemented a variational Bayes treatment of JFA which accommodates adaptation of universal background models (UBMs) in a natural way. This allows us to experiment with several types of features for speaker recognition: speaker factors and diagonal factors in addition to i-vectors, extracted with and without UBM adaptation in each case. We found that, in text-independent speaker verification experiments on NIST data, extracting i-vectors with UBM adaptation led to a 10% reduction in equal error rates although performance did not improve consistently over the whole DET curve. We achieved a further 10% reduction (with a similar inconsistency) by using speaker factors extracted with UBM adaptation as features. In text-dependent speaker recognition experiments on RSR2015 data, we were able to achieve very good performance using a JFA model with diagonal factors but no speaker factors as a feature extractor. Contrary to standard practice, this JFA model was configured so as to model speakerphrase combinations (rather than speakers) and it was trained on utterances of very short duration (rather than whole recording sessions). We also present a variant of the length normalization trick inspired by uncertainty propagation which leads to substantial gains in performance over the whole DET curve.
  • Keywords
    speaker recognition; JFA based front ends; diagonal factors; feature extractor; i-vectors; joint factor analysis; speaker factors; speaker recognition; speech segments; universal background models; variational Bayes treatment; Adaptation models; Feature extraction; NIST; Speaker recognition; Speech; Training; Vectors; PLDA; i-vectors; joint factor analysis; speaker recognition; variational Bayes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6853889
  • Filename
    6853889