• DocumentCode
    2253765
  • Title

    Modeling long term variability information in a mixture stochastic trajectory framework

  • Author

    Gong, Yifan ; Illina, Irina ; Haton, Jean-Paul

  • Author_Institution
    Inst. Nat. de Recherche en Inf. et Autom., Vandoeuvre-les-Nancy, France
  • Volume
    1
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    334
  • Abstract
    The problem of acoustic modeling for speech recognizers is addressed. We distinguish two types of speech variability, long term (speaker identity, stationary noise, channel distortion) and short term (phoneme class). Currently, most recognizers model the two variabilities without considering their specificities, which may result in flat distributions with limited discriminability. In our system, the long term variability (environment) is modeled by a mixture model, where each mixture is modeled by a mixture stochastic trajectory model (MSTM). We propose the environment dependent mixture stochastic trajectory model (ED-MSTM) to model a set of environments. The parameters of ED-MSTM are estimated using the maximum likelihood (ML) estimation criterion by the expectation-maximisation (EM) algorithm. Our model has been tested on a 1011 word vocabulary, multi-speaker continuous French recognition task with noisy speech. In the experiments, we assume that speakers can be grouped into a pre-determined number of classes and that the class label of a speaker is missing. The use of environmental modeling cut down the error rate produced by the multi-speaker system by about 15%, which is a statistically significant improvement. The idea of environment modeling is applicable to other acoustic modeling techniques such as hidden Markov models
  • Keywords
    errors; hidden Markov models; maximum likelihood estimation; noise; speech recognition; stochastic processes; acoustic modeling; channel distortion; environmental modeling; error rate; expectation-maximisation algorithm; hidden Markov models; long term variability information; maximum likelihood estimation; mixture stochastic trajectory model; multi-speaker continuous French recognition; noisy speech; parameter estimation; phoneme class; speaker identity; speech recognition; speech variability; stationary noise; vocabulary; Acoustic distortion; Acoustic noise; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Speech enhancement; Speech recognition; Stochastic processes; Stochastic resonance; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607121
  • Filename
    607121