• DocumentCode
    2325503
  • Title

    Fast speaker adaptation of artificial neural networks for automatic speech recognition

  • Author

    Dupont, Stéphane ; Cheboub, Leila

  • Author_Institution
    TCTS-MULTITEL, Faculte Polytech. de Mons, Belgium
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1795
  • Abstract
    This paper presents a fast speaker adaptation technique dedicated to automatic speech recognition systems using artificial neural networks (ANNs) for hidden Markov models (HMMs) state probability estimation. Speaker-adapted ANNs are first obtained from the training data using affine transformations in the feature space. Similarly to the “eigenvoice” approach, principal components analysis (PCA) is then applied to these transformation matrices. The first few eigenvectors represent a small-dimensional space which captures most of the inter-speaker variability of the training set. During operation, these eigenvectors can be used to constrain the optimization of the transformation matrices for the new speakers. This optimization is performed using steepest descent with gradients obtained using backpropagation through the speaker independent ANN. We have been using state-of-the-art hybrid HMM/ANN systems trained on the Phonebook database. Supervised adaptation experiments with different amounts of data show better performance of this new technique compared to standard linear regression in the feature space: with only 20 words of adaptation data, results show a 15% relative decrease of the word error rate
  • Keywords
    backpropagation; eigenvalues and eigenfunctions; estimation theory; hidden Markov models; neural nets; optimisation; principal component analysis; probability; speech recognition; HMMs state probability estimation; Phonebook database; affine transformations; artificial neural networks; automatic speech recognition; backpropagation; eigenvectors; eigenvoice; fast speaker adaptation; gradients; hidden Markov models; inter-speaker variability; optimization; performance; principal components analysis; speaker independent ANN; steepest descent; training data; training set; transformation matrices; word error rate; Artificial neural networks; Automatic speech recognition; Backpropagation; Constraint optimization; Hidden Markov models; Linear regression; Principal component analysis; Spatial databases; State estimation; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-6293-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2000.862102
  • Filename
    862102