• DocumentCode
    2066214
  • Title

    Improved Semi-Parametric Mean Trajectory Model Using Discriminatively Trained Centroids

  • Author

    Xu, Ran ; Pan, Jielin ; Yan, Yonghong

  • Author_Institution
    ThinkIT Speech Lab. Inst. of Acoust., Chinese Acad. of Sci., Beijing, China
  • fYear
    2008
  • fDate
    16-19 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In order to alleviate the limitation of "state output probability conditional independence" assumption held by Hidden Markov models (HMMs) in speech recognition, a discriminative semi-parametric trajectory model was proposed in recent years, in which both means and variances in the acoustic models are modeled as time-varying variables. The time- varying information is modeled as a weighted contribution from all the "centroids", which can be viewed as the representation of the acoustic space. In previous literatures, such centroids are often obtained by clustering the Gaussians in the baseline acoustic models to some reasonable number or by training a baseline model with fewer Gaussian components. The centroids obtained in this way are maximum likelihood estimation of the acoustic space, which are relatively weak in discriminability compared to the discriminatively trained acoustic models. In this paper, we proposed an improved semi-parametric mean trajectory model training framework, in which the centroids are first discriminatively trained by minimum phone error criterion to provide a more discriminative representation of the acoustic space. This method was evaluated on the Mandarin digit string recognition task. The experimental result shows that our proposed method improves the recognition performance by a relative string error rate reduction of 7.5% compared to the traditional discriminative semi-parametric trajectory model, and it outperforms the baseline acoustic model trained with maximum likelihood criterion by a relative string error rate reduction of 28.6%.
  • Keywords
    hidden Markov models; maximum likelihood estimation; speech recognition; discriminatively trained centroids; hidden Markov models; maximum likelihood estimation; semi-parametric mean trajectory model; speech recognition; time-varying information; Acoustics; Error analysis; Gaussian processes; Hidden Markov models; Laboratories; Maximum likelihood estimation; Mutual information; Radio access networks; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-2942-4
  • Electronic_ISBN
    978-1-4244-2943-1
  • Type

    conf

  • DOI
    10.1109/CHINSL.2008.ECP.63
  • Filename
    4730317