• DocumentCode
    3244951
  • Title

    Frequency-domain linear prediction for temporal features

  • Author

    Athineos, Marios ; Ellis, Daniel P W

  • Author_Institution
    Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
  • fYear
    2003
  • fDate
    30 Nov.-3 Dec. 2003
  • Firstpage
    261
  • Lastpage
    266
  • Abstract
    Current speech recognition systems uniformly employ short-time spectral analysis, usually over windows of 10-30 ms, as the basis for their acoustic representations. Any detail below this timescale is lost, and even temporal structures above this level are usually only weakly represented in the form of deltas etc. We address this limitation by proposing a novel representation of the temporal envelope in different frequency bands by exploring the dual of conventional linear prediction (LPC) when applied in the transform domain. With this technique of frequency-domain linear prediction (FDLP), the ´poles´ of the model describe temporal, rather than spectral, peaks. By using analysis windows on the order of hundreds of milliseconds, the procedure automatically decides how to distribute poles to model the temporal structure best within the window. While this approach offers many possibilities for novel speech features, we experiment with one particular form, an index describing the ´sharpness´ of individual poles within a window, and show a relatively large word error rate improvement from 4.97% to 3.81% in a recognizer trained on general conversational telephone speech and tested on a small-vocabulary spontaneous numbers task. We analyze this improvement in terms of the confusion matrices and suggest how the newly-modeled fine temporal structure may be helping.
  • Keywords
    duality (mathematics); error statistics; frequency-domain analysis; matrix algebra; poles and zeros; signal representation; spectral analysis; speech recognition; 10 to 30 ms; acoustic representations; confusion matrices; conversational telephone speech; frequency-domain linear prediction; poles; short-time spectral analysis; small-vocabulary spontaneous numbers task; spectral peaks; speech recognition; temporal envelope; temporal features; temporal peaks; word error rate; Acoustic testing; Automatic speech recognition; Discrete cosine transforms; Error analysis; Frequency domain analysis; Linear predictive coding; Predictive models; Spectral analysis; Speech recognition; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
  • Print_ISBN
    0-7803-7980-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2003.1318451
  • Filename
    1318451