• DocumentCode
    319597
  • Title

    Auditory masking based acoustic front-end for robust speech recognition

  • Author

    Paliwal, K.K. ; Lilly, B.T.

  • Author_Institution
    Sch. of Microelectron. Eng., Griffith Univ., Brisbane, Qld., Australia
  • Volume
    1
  • fYear
    1997
  • fDate
    4-4 Dec. 1997
  • Firstpage
    165
  • Abstract
    This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal. Using the properties of simultaneous masking found in the human auditory system, we compute a masking threshold as a function of frequency for a given speech frame from its power spectrum. All those portions of the power spectrum which are below the auditory threshold are not heard by the human auditory system due to masking effects and hence can be discarded. These portions are replaced by the corresponding portions in the masking threshold spectrum. This modified power spectrum is processed by the linear prediction analysis or homomorphic analysis procedure to derive cepstral features for each speech frame. We study the performance of this front-end for speech recognition under noisy environments. This front-end performs significantly better than the conventional linear prediction or homomorphic analysis based front-ends for noisy speech. In terms of signal-to-noise ratio, simultaneous masking offers an advantage of more than 5 dB over the LPCC front-end in isolated word recognition experiments and 3 dB in continuous speech recognition experiments.
  • Keywords
    acoustic signal processing; cepstral analysis; feature extraction; hearing; noise; satellite computers; speech processing; speech recognition; LPCC front-end; SNR; acoustic features extraction; acoustic front-end; auditory masking; cepstral features; continuous speech recognition; homomorphic analysis; human auditory system; isolated word recognition; linear prediction analysis; masking threshold; noisy environments; performance; power spectrum; signal-to-noise ratio; simultaneous masking; speech frame frequency; speech recognition; speech signal; Auditory system; Cepstral analysis; Feature extraction; Humans; Masking threshold; Robustness; Speech analysis; Speech coding; Speech recognition; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications., Proceedings of IEEE
  • Conference_Location
    Brisbane, Qld., Australia
  • Print_ISBN
    0-7803-4365-4
  • Type

    conf

  • DOI
    10.1109/TENCON.1997.647283
  • Filename
    647283