N-best-based instantaneous speaker adaptation method for speech recognition

Author

Matsui, Tomoko ; Furui, Sadaoki

Author_Institution

NTT Human Interface Labs., Tokyo, Japan

Volume

2

fYear

1996

fDate

3-6 Oct 1996

Firstpage

973

Abstract

An instantaneous speaker adaptation method is proposed that uses N-best decoding for continuous mixture-density hidden Markov model-based speech recognition systems. An N-best paradigm of multiple-pass search strategies is used that makes this method effective even for speakers whose decodings using speaker-independent models are error-prone. To cope with an insufficient amount of data, our method uses constrained maximum a posteriori estimation, in which the parameter vector space is clustered, and a mixture-mean bias is estimated for each cluster. Moreover, to maintain continuity between clusters, a bias for each mixture-mean is calculated as the weighted sum of the estimated biases. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed more than a 20% reduction in the error rates, even for speakers whose decodings using speaker-independent models were error-prone

Keywords

decoding; hidden Markov models; maximum likelihood estimation; software performance evaluation; speech recognition; telephony; N-best decoding; clustered parameter vector space; connected-digit recognition; constrained maximum a posteriori estimation; continuous mixture-density hidden Markov model-based speech recognition systems; error rate reduction; error-prone decoding; instantaneous speaker adaptation method; inter-cluster continuity; mixture-mean bias; multiple-pass search strategies; performance evaluation; speaker-independent models; telephone; Hidden Markov models; Humans; Laboratories; Maximum a posteriori estimation; Maximum likelihood decoding; Maximum likelihood estimation; Performance evaluation; Speech recognition; Telephony; Viterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607765

Filename

607765