DocumentCode :
1457400
Title :
An Auditory Motivated Asymmetric Compression Technique for Speech Recognition
Author :
Haque, Serajul ; Togneri, Roberto ; Zaknich, Anthony
Author_Institution :
Dept. of Electr., Electron., & Comput. Eng., Univ. of Western Australia, Crawley, WA, Australia
Volume :
19
Issue :
7
fYear :
2011
Firstpage :
2111
Lastpage :
2124
Abstract :
The Mel-frequency cepstral coefficient (MFCC) parameterization for automatic speech recognition (ASR) utilizes several perceptual features of the human auditory system, one of which is the static compression. Motivated by the human auditory system, the conventional static logarithmic compression applied in the MFCC is analyzed using psychophysical loudness perception curves. Following the property of the auditory system that the dynamic range compression is higher in the basal regions than the apical regions of the basilar membrane, we propose a method of unequal (asymmetric) compression, i.e., higher compression applied in the higher frequency regions than the lower frequency regions. The methods is applied and tested in the MFCC and the PLP parameterizations in the spectral domain, and the ZCPA auditory model used as an ASR front-end in the temporal domain. The extent of the asymmetric compression is applied as a multiplicative gain to the existing static compression, and is determined from the gradient of the piece-wise linear segment of the perceptual compression curve. The proposed method has the advantage of adjusting compression parametrically for improved ASR performance and audibility in noise conditions by low-frequency spectral enhancement, particularly of vowels with lower F1 and F2 formants. Continuous-density HMM recognition using the Aurora 2 corpus and the TIdigits show performance improvements in additive noise conditions.
Keywords :
data compression; hearing; speech recognition; PLP parameterizations; additive noise conditions; auditory motivated asymmetric compression; conventional static logarithmic compression; human auditory system; mel frequency cepstral coefficient parameterization; perceptual compression curve; piece wise linear segment; speech recognition; static compression; Auditory system; Dynamic range; Humans; Mel frequency cepstral coefficient; Noise; Speech; Speech recognition; Auditory compression; auditory system; feature extraction; hidden Markov model (HMM); speech recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2112646
Filename :
5719293
Link To Document :
بازگشت