DocumentCode
319597
Title
Auditory masking based acoustic front-end for robust speech recognition
Author
Paliwal, K.K. ; Lilly, B.T.
Author_Institution
Sch. of Microelectron. Eng., Griffith Univ., Brisbane, Qld., Australia
Volume
1
fYear
1997
fDate
4-4 Dec. 1997
Firstpage
165
Abstract
This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal. Using the properties of simultaneous masking found in the human auditory system, we compute a masking threshold as a function of frequency for a given speech frame from its power spectrum. All those portions of the power spectrum which are below the auditory threshold are not heard by the human auditory system due to masking effects and hence can be discarded. These portions are replaced by the corresponding portions in the masking threshold spectrum. This modified power spectrum is processed by the linear prediction analysis or homomorphic analysis procedure to derive cepstral features for each speech frame. We study the performance of this front-end for speech recognition under noisy environments. This front-end performs significantly better than the conventional linear prediction or homomorphic analysis based front-ends for noisy speech. In terms of signal-to-noise ratio, simultaneous masking offers an advantage of more than 5 dB over the LPCC front-end in isolated word recognition experiments and 3 dB in continuous speech recognition experiments.
Keywords
acoustic signal processing; cepstral analysis; feature extraction; hearing; noise; satellite computers; speech processing; speech recognition; LPCC front-end; SNR; acoustic features extraction; acoustic front-end; auditory masking; cepstral features; continuous speech recognition; homomorphic analysis; human auditory system; isolated word recognition; linear prediction analysis; masking threshold; noisy environments; performance; power spectrum; signal-to-noise ratio; simultaneous masking; speech frame frequency; speech recognition; speech signal; Auditory system; Cepstral analysis; Feature extraction; Humans; Masking threshold; Robustness; Speech analysis; Speech coding; Speech recognition; Working environment noise;
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications., Proceedings of IEEE
Conference_Location
Brisbane, Qld., Australia
Print_ISBN
0-7803-4365-4
Type
conf
DOI
10.1109/TENCON.1997.647283
Filename
647283
Link To Document