• DocumentCode
    417257
  • Title

    Basis superposition precision matrix modelling for large vocabulary continuous speech recognition

  • Author

    Sim, K.C. ; Gales, M.J.F.

  • Author_Institution
    Dept. of Eng., Cambridge Univ., UK
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    An important aspect of using Gaussian mixture models in a HMM-based speech recognition systems is the form of the covariance matrix. One successful approach has been to model the inverse covariance, precision, matrix by superimposing multiple bases. This paper presents a general framework of basis superposition. Models are described in terms of parameter tying of the basis coefficients and restrictions in the number of basis. Two forms of parameter tying are described which provide a compact model structure. The first constrains the basis coefficients over multiple basis vectors (or matrices). This is related to the Subspace for Precision and Mean (SPAM) model. The second constrains the basis coefficients over multiple components, yielding as one example heteroscedastic LDA (HLDA). Both maximum likelihood and minimum phone error training of these models are discussed. The performance of various configurations is examined on a conversational telephone speech task, SwitchBoard.
  • Keywords
    Gaussian distribution; covariance matrices; hidden Markov models; matrix inversion; maximum likelihood estimation; speech processing; speech recognition; telephony; vocabulary; Gaussian mixture models; HLDA; HMM-based speech recognition systems; SPAM model; Subspace for Precision and Mean model; SwitchBoard; basis superposition; compact model structure; conversational telephone speech task; heteroscedastic LDA; inverse covariance matrix; large vocabulary continuous speech recognition; maximum likelihood training; minimum phone error training; multiple basis vectors; parameter tying; performance; precision matrix modelling; Covariance matrix; Hidden Markov models; Inverse problems; Linear discriminant analysis; Maximum likelihood estimation; Speech recognition; Symmetric matrices; Telephony; Unsolicited electronic mail; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326107
  • Filename
    1326107