مرکز منطقه ای اطلاع رساني علوم و فناوري - Variational bayesian estimation and clustering for speech recognition

DocumentCode :

1010054

Title :

Variational bayesian estimation and clustering for speech recognition

Author :

Watanabe, Shinji ; Minami, Yasuhiro ; Nakamura, Atsushi ; Ueda, Naonori

Author_Institution :

NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan

Volume :

Issue :

fYear :

2004

fDate :

7/1/2004 12:00:00 AM

Firstpage :

365

Lastpage :

381

Abstract :

In this paper, we propose variational Bayesian estimation and clustering for speech recognition (VBEC), which is based on the variational Bayesian (VB) approach. VBEC is a total Bayesian framework: all speech recognition procedures (acoustic modeling and speech classification) are based on VB posterior distribution, unlike the maximum likelihood (ML) approach based on ML parameters. The total Bayesian framework generates two major Bayesian advantages over the ML approach for the mitigation of over-training effects, as it can select an appropriate model structure without any data set size condition, and can classify categories robustly using a predictive posterior distribution. By using these advantages, VBEC: 1) allows the automatic construction of acoustic models along two separate dimensions, namely, clustering triphone hidden Markov model states and determining the number of Gaussians and 2) enables robust speech classification, based on Bayesian predictive classification using VB posterior distributions. The capabilities of the VBEC functions were confirmed in large vocabulary continuous speech recognition experiments for read and spontaneous speech tasks. The experiments confirmed that VBEC automatically constructed accurate acoustic models and robustly classified speech, i.e., totally mitigated the over-training effects with high word accuracies due to the VBEC functions.

Keywords :

Bayes methods; hidden Markov models; maximum likelihood estimation; pattern clustering; speech recognition; Bayesian clustering; acoustic modeling; clustering triphone hidden Markov model states; data set size condition; maximum likelihood approach; over-training effects; posterior distribution; robust speech classification; speech classification; speech recognition; variational Bayesian estimation; Bayesian methods; Gaussian distribution; Gaussian processes; Hidden Markov models; Maximum likelihood estimation; Parameter estimation; Predictive models; Robustness; Speech recognition; Vocabulary;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2004.828640

Filename :

1306510

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1010054