DocumentCode
1843640
Title
Modeling auditory perception to improve robust speech recognition
Author
Strope, Brian ; Alwan, Abeer
Author_Institution
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
Volume
2
fYear
1997
fDate
2-5 Nov. 1997
Firstpage
1056
Abstract
While non-stationary stochastic techniques have led to substantial improvements in vocabulary size and speaker independence, most automatic speech recognition (ASR) systems remain overly sensitive to the acoustic environment, precluding robust widespread applications. Our approach to this problem has been to model fundamental aspects of auditory perception, which are typically neglected in common ASR front ends, to derive a more robust and phonetically relevant parameterization of speech. Short-term adaptation and recovery, a sensitivity to local spectral peaks, together with an explicit parameterization of the position and motion of local spectral peaks reduces the error rate of a word recognition task by as much as a factor of 4. Current work also investigates the perceptual significance of pitch-rate amplitude-modulation cues in noise.
Keywords
amplitude modulation; hearing; spectral analysis; speech recognition; ASR front ends; acoustic environment; auditory perception modelling; automatic speech recognition; error rate reduction; local spectral peaks sensitivity; noise; phonetically relevant parameterization; pitch-rate amplitude-modulation cues; robust speech recognition; short-term adaptation; short-term recovery; word recognition task; Automatic speech recognition; Discrete cosine transforms; Filters; Frequency estimation; Hidden Markov models; Robustness; Signal processing; Spectrogram; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Signals, Systems & Computers, 1997. Conference Record of the Thirty-First Asilomar Conference on
Conference_Location
Pacific Grove, CA, USA
ISSN
1058-6393
Print_ISBN
0-8186-8316-3
Type
conf
DOI
10.1109/ACSSC.1997.679067
Filename
679067
Link To Document