• DocumentCode
    3245418
  • Title

    Hidden mode HMM using Bayesian network for modeling speaking rate fluctuation

  • Author

    Shinozaki, Takahiro ; Furui, Sadaoki

  • Author_Institution
    Dept. of Comput. Sci., Tokyo Inst. of Technol., Japan
  • fYear
    2003
  • fDate
    30 Nov.-3 Dec. 2003
  • Firstpage
    417
  • Lastpage
    422
  • Abstract
    One of the most important issues in spontaneous speech recognition is how to cope with the degradation of recognition accuracy due to speaking rate fluctuation within an utterance. This paper proposes an acoustic model for adjusting mixture weights and transition probabilities of the HMM for each frame according to the local speaking rate. The proposed model is implemented along with variants and conventional models using the Bayesian network framework. The proposed model has a hidden variable representing variation of the "mode" of the speaking rate and its value controls the parameters of the underlying HMM. Model training and maximum probability assignment of the variables are conducted using the EM/GEM and inference algorithms for Bayesian networks. Utterances from meetings and lectures are used for evaluation where Bayesian network-based acoustic models are used to rescore the utterance hypotheses obtained from a first-pass N-best list. In the experiments, the proposed model shows consistently higher performance than conventional models.
  • Keywords
    belief networks; hidden Markov models; inference mechanisms; maximum likelihood estimation; speech recognition; Bayesian network framework; EM/GEM algorithms; acoustic model; hidden mode HMM; inference algorithm; maximum probability assignment; mixture weights; model training; performance; speaking rate fluctuation; spontaneous speech recognition; transition probabilities; utterance hypotheses; Acoustic testing; Bayesian methods; Computer science; Decoding; Degradation; Fluctuations; Hidden Markov models; Inference algorithms; Speech recognition; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
  • Print_ISBN
    0-7803-7980-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2003.1318477
  • Filename
    1318477