• DocumentCode
    1654525
  • Title

    Incorporation of time-varying LP cepstral features in HMM-based isolated word speech recognition

  • Author

    Ang, Federico ; Tsutsui, Hiroshi ; Miyanaga, Yoshikazu

  • Author_Institution
    ICN Lab., Hokkaido Univ., Sapporo, Japan
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Current state-of-the-art automatic, continuous speech recognition systems have enjoyed huge leaps in accuracy using speech features that assumes stationarity in the signals that are being processed. However, the said performance can often be attributed to the inclusion of lexical information. For the case of isolated word tasks, without the use of a priori models for the expected words, the static speech representation breaks down. For example, words that only differ in one unvoiced part are often misrecognized. Thus, time-varying speech representations has become an interest in the field. This paper is concerned with the use of simple time-varying features based on an autoregressive modeling of speech that provides high resolution features. In particular, how the said high resolution features fit into a finite-length Hidden Markov Model-based acoustic model that was originally used for static features. Its performance is compared with the best performing static features (Mel-Frequency Cepstral Coefficients) and while it is currently viewed as suboptimal, ample rooms for improvement are also emphasized.
  • Keywords
    autoregressive processes; cepstral analysis; feature extraction; hidden Markov models; linear predictive coding; signal representation; speech recognition; HMM-based isolated word speech recognition; automatic speech recognition system; autoregressive modeling; continuous speech recognition system; finite-length hidden Markov model-based acoustic model; isolated word tasks; time-varying LP cepstral feature improvement; time-varying speech representations; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; isolated word speech recognition; time-varying AR model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Circuits and Systems (ISSCS), 2015 International Symposium on
  • Conference_Location
    Iasi
  • Print_ISBN
    978-1-4673-7487-3
  • Type

    conf

  • DOI
    10.1109/ISSCS.2015.7204030
  • Filename
    7204030