• DocumentCode
    1350257
  • Title

    Cluster adaptive training of hidden Markov models

  • Author

    Gales, Mark J F

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    8
  • Issue
    4
  • fYear
    2000
  • fDate
    7/1/2000 12:00:00 AM
  • Firstpage
    417
  • Lastpage
    428
  • Abstract
    When performing speaker adaptation, there are two conflicting requirements. First, the speaker transform must be powerful enough to represent the speaker. Second, the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or the acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speaker-independent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speaker-independent model set
  • Keywords
    adaptive systems; hidden Markov models; interpolation; maximum likelihood estimation; pattern clustering; speech recognition; transforms; acoustic environment; canonical mean; cluster adaptive training; cluster means; hidden Markov models; interpolation weights; linear interpolation; maximum likelihood estimates; re-estimation formulae; speaker adaptation; speaker transform; speaker-independent model set; speaker-independent task; speech recognition; word error rate reduction; Databases; Error analysis; Hidden Markov models; Interpolation; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Power system modeling; Robustness; Speech recognition;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.848223
  • Filename
    848223