• DocumentCode
    3441934
  • Title

    The modified group delay function and its application to phoneme recognition

  • Author

    Murthy, Hema A. ; Gadde, Vijay

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai, India
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    We explore a new spectral representation of speech signals through group delay functions. The group delay functions by themselves are noisy and difficult to interpret owing to zeroes that are close to the unit circle in the z-domain and these clutter the spectra. A new modified group delay function (Yegnanarayan, B. and Murthy, H.A., IEEE Trans. Sig. Processing, vol.40, p.2281-9, 1992) that reduces the effects of zeroes close to the unit circle is used. Assuming that this new function is minimum phase, the modified group delay spectrum is converted to a sequence of cepstral coefficients. A preliminary phoneme recogniser is built using features derived from these cepstra. Results are compared with those obtained from features derived from the traditional mel frequency cepstral coefficients (MFCC). The baseline MFCC performance is 34.7%, while that of the best modified group delay cepstrum is 39.2%. The performance of the composite MFCC feature, which includes the derivatives and double derivatives, is 60.7%, while that of the composite modified group delay feature is 57.3%. When these two composite features are combined, ∼2% improvement in performance is achieved (62.8%). When this new system is combined with linear frequency cepstra (LFC) (Gadde, V.R.R. et al., The SRI SPINE 2001 Evaluation System. http://elazar.itd.nrl.navy.mil/spine/sri2/presentation/sri2001.html, 2001), the system performance results in another ∼0.8% improvement (63.6%).
  • Keywords
    Fourier transforms; cepstral analysis; feature extraction; poles and zeros; speech processing; speech recognition; Fourier transform phase function; MFCC; cepstral coefficients; double derivatives; feature extraction; mel frequency cepstral coefficients; modified group delay cepstrum; modified group delay function; phoneme recognition; speech signal spectral representation; zeroes; Application software; Cepstral analysis; Computer science; Delay; Fourier transforms; Mel frequency cepstral coefficient; Phase estimation; Speech analysis; Speech recognition; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198718
  • Filename
    1198718