• DocumentCode
    1188312
  • Title

    Subspace constrained Gaussian mixture models for speech recognition

  • Author

    Axelrod, Scott ; Goel, Vaibhava ; Gopinath, Ramesh A. ; Olsen, Peder A. ; Visweswariah, Karthik

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    13
  • Issue
    6
  • fYear
    2005
  • Firstpage
    1144
  • Lastpage
    1160
  • Abstract
    A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, maximum likelihood linear transformation, or extended maximum likelihood linear transformation), as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper, we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar-based tasks, as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity.
  • Keywords
    Gaussian distribution; computational complexity; covariance matrices; error statistics; grammars; hidden Markov models; maximum likelihood estimation; speech recognition; vocabulary; Gaussian mixture model; acoustic vector; automatic speech recognition; computational complexity; covariance modeling; grammar-based task; hidden Markov model; linear constraint; linear monomial; maximum likelihood criterion; maximum likelihood estimation; precision matrix; quadratic monomial; vocabulary; word error rate; Automatic speech recognition; Computational complexity; Covariance matrix; Error analysis; Hidden Markov models; Linear discriminant analysis; Speech recognition; Subspace constraints; Vectors; Vocabulary; Automatic speech recognition; covariance modeling; exponential family; maximum likelihood estimation of SCGMM; subspace constrained Gaussian mixture models (SCGMMs); subspace constrained exponential models;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2005.851965
  • Filename
    1518915