• DocumentCode
    312146
  • Title

    N-best-based instantaneous speaker adaptation method for speech recognition

  • Author

    Matsui, Tomoko ; Furui, Sadaoki

  • Author_Institution
    NTT Human Interface Labs., Tokyo, Japan
  • Volume
    2
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    973
  • Abstract
    An instantaneous speaker adaptation method is proposed that uses N-best decoding for continuous mixture-density hidden Markov model-based speech recognition systems. An N-best paradigm of multiple-pass search strategies is used that makes this method effective even for speakers whose decodings using speaker-independent models are error-prone. To cope with an insufficient amount of data, our method uses constrained maximum a posteriori estimation, in which the parameter vector space is clustered, and a mixture-mean bias is estimated for each cluster. Moreover, to maintain continuity between clusters, a bias for each mixture-mean is calculated as the weighted sum of the estimated biases. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed more than a 20% reduction in the error rates, even for speakers whose decodings using speaker-independent models were error-prone
  • Keywords
    decoding; hidden Markov models; maximum likelihood estimation; software performance evaluation; speech recognition; telephony; N-best decoding; clustered parameter vector space; connected-digit recognition; constrained maximum a posteriori estimation; continuous mixture-density hidden Markov model-based speech recognition systems; error rate reduction; error-prone decoding; instantaneous speaker adaptation method; inter-cluster continuity; mixture-mean bias; multiple-pass search strategies; performance evaluation; speaker-independent models; telephone; Hidden Markov models; Humans; Laboratories; Maximum a posteriori estimation; Maximum likelihood decoding; Maximum likelihood estimation; Performance evaluation; Speech recognition; Telephony; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607765
  • Filename
    607765