• DocumentCode
    977707
  • Title

    Speaker adaptation using constrained estimation of Gaussian mixtures

  • Author

    Digalakis, Vassilios V. ; Rtischev, Dimitry ; Neumeyer, Leonardo G.

  • Author_Institution
    SRI Int., Menlo Park, CA, USA
  • Volume
    3
  • Issue
    5
  • fYear
    1995
  • fDate
    9/1/1995 12:00:00 AM
  • Firstpage
    357
  • Lastpage
    366
  • Abstract
    A trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, the authors propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both native and nonnative speakers of American English. For nonnative speakers, the recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers. For native speakers, the recognition performance after adaptation improves to the accuracy of speaker-dependent systems that use six times as much training data
  • Keywords
    Gaussian processes; error statistics; hidden Markov models; speech recognition; American English; Gaussian mixtures; automatic speech recognition systems; component densities; constrained estimation; continuous mixture-density hidden Markov models; error rate; large-vocabulary Wall Street Journal corpus; performance; robust maximum-likelihood estimate; speaker adaptation; Automatic speech recognition; Degradation; Error analysis; Hidden Markov models; Maximum likelihood estimation; Probability distribution; Robustness; Speech recognition; Training data; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.466659
  • Filename
    466659