Improved Semi-Parametric Mean Trajectory Model Using Discriminatively Trained Centroids

Author

Xu, Ran ; Pan, Jielin ; Yan, Yonghong

Author_Institution

ThinkIT Speech Lab. Inst. of Acoust., Chinese Acad. of Sci., Beijing, China

fYear

2008

fDate

16-19 Dec. 2008

Firstpage

1

Lastpage

4

Abstract

In order to alleviate the limitation of "state output probability conditional independence" assumption held by Hidden Markov models (HMMs) in speech recognition, a discriminative semi-parametric trajectory model was proposed in recent years, in which both means and variances in the acoustic models are modeled as time-varying variables. The time- varying information is modeled as a weighted contribution from all the "centroids", which can be viewed as the representation of the acoustic space. In previous literatures, such centroids are often obtained by clustering the Gaussians in the baseline acoustic models to some reasonable number or by training a baseline model with fewer Gaussian components. The centroids obtained in this way are maximum likelihood estimation of the acoustic space, which are relatively weak in discriminability compared to the discriminatively trained acoustic models. In this paper, we proposed an improved semi-parametric mean trajectory model training framework, in which the centroids are first discriminatively trained by minimum phone error criterion to provide a more discriminative representation of the acoustic space. This method was evaluated on the Mandarin digit string recognition task. The experimental result shows that our proposed method improves the recognition performance by a relative string error rate reduction of 7.5% compared to the traditional discriminative semi-parametric trajectory model, and it outperforms the baseline acoustic model trained with maximum likelihood criterion by a relative string error rate reduction of 28.6%.

Keywords

hidden Markov models; maximum likelihood estimation; speech recognition; discriminatively trained centroids; hidden Markov models; maximum likelihood estimation; semi-parametric mean trajectory model; speech recognition; time-varying information; Acoustics; Error analysis; Gaussian processes; Hidden Markov models; Laboratories; Maximum likelihood estimation; Mutual information; Radio access networks; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on

Conference_Location

Kunming

Print_ISBN

978-1-4244-2942-4

Electronic_ISBN

978-1-4244-2943-1

Type

conf

DOI

10.1109/CHINSL.2008.ECP.63

Filename

4730317