• DocumentCode
    865624
  • Title

    Significance of the Modified Group Delay Feature in Speech Recognition

  • Author

    Hegde, Rajesh M. ; Murthy, Hema A. ; Gadde, Venkata Ramana Rao

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai
  • Volume
    15
  • Issue
    1
  • fYear
    2007
  • Firstpage
    190
  • Lastpage
    202
  • Abstract
    Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed
  • Keywords
    Fourier transforms; feature extraction; speaker recognition; speech processing; Fourier transform magnitude; cepstral features; continuous-speech recognition; features extraction; group delay function; language recognition; modified group delay feature; phase spectra; phase spectrum; pitch periodicity effects; speaker recognition; speech perception; speech spectral representation; Data mining; Delay effects; Feature extraction; Fourier transforms; Resonance; Signal processing; Speech coding; Speech processing; Speech recognition; Wrapping; Class separability; Gaussian mixture models (GMMs); feature extraction; feature selection; group delay function; hidden Markov models (HMMs); phase spectrum; robustness;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.876858
  • Filename
    4032772