Title of article :
Bayesian learning of speech duration models
Author/Authors :
Chien، Jen-Tzung نويسنده , , Huang، Chih-Hsien نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2003
Pages :
-557
From page :
558
To page :
0
Abstract :
This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMMs for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously. In the experiments, the proposed adaptive duration model improves the speech recognition performance of Mandarin broadcast news and noisy connected digits. The batch and sequential learning are respectively investigated for MAP and QB duration models.
Keywords :
millimeter wave , rectangular waveguide (RWG) , waveguide transition , low-temperature co-fired ceramic (LTCC) , Laminated waveguide
Journal title :
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
Serial Year :
2003
Journal title :
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
Record number :
86931
Link To Document :
بازگشت