• DocumentCode
    2855929
  • Title

    The formant structure based feature parameter for speech recognition

  • Author

    Zhao, Junhui ; Kuang, Jingming ; Xie, Xiang

  • Author_Institution
    Res. Center of Digits Commun. Technol., Beijing Inst. of Technol., China
  • fYear
    2003
  • fDate
    28 Sept.-1 Oct. 2003
  • Firstpage
    605
  • Lastpage
    608
  • Abstract
    In this paper, we proposed a new set of speech feature parameters based on formant structure information. The speech signal is first divided into a gammatone filterbank, and then the Teager energy signal of each sub-band is extracted according to Teager Energy Operator. Two different energy separation algorithms, DESA-1 and DESA-2, are applied for obtain the instantaneous amplitude and frequency envelope of the formants, respectively. Finally, the feature vector is constructed by the amplitude and frequency information of the formants. The motivation of developing this new feature is that the formant location information is a quite distinct speech representation but seldom applied into speech recognition system before, and the conventional feature parameters such as mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) do not explicitly model spectral peak information which is very important clue to identify the different phones. A Mandarin digit string recognition task is performed for evaluating the performance of the proposed feature parameter. The recognition results show an improved speech recognition performance compared to the conventional MFCC and LPCC.
  • Keywords
    cepstral analysis; channel bank filters; feature extraction; natural languages; speech recognition; Mandarin digit string recognition task; feature vector; formant location information; formant structure; gammatone filterbank; linear prediction cepstral coefficients; mel-frequency cepstral coefficients; speech feature parameters; speech recognition; teager energy operator; teager energy signal; Automatic speech recognition; Cepstral analysis; Data mining; Frequency estimation; Frequency modulation; Mel frequency cepstral coefficient; Resonance; Signal processing; Speech processing; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Statistical Signal Processing, 2003 IEEE Workshop on
  • Print_ISBN
    0-7803-7997-7
  • Type

    conf

  • DOI
    10.1109/SSP.2003.1289551
  • Filename
    1289551