DocumentCode
863235
Title
Discriminative auditory-based features for robust speech recognition
Author
Mak, Brian Kan-Wing ; Tam, Yik-Cheung ; Li, Peter Qi
Author_Institution
Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., China
Volume
12
Issue
1
fYear
2004
Firstpage
27
Lastpage
36
Abstract
Recently, a new auditory-based feature extraction algorithm for robust speech recognition in noisy environments was proposed. The new features are derived by mimicking closely the human peripheral auditory process and the filters in the outer ear, middle ear, and inner ear are obtained from psychoacoustics literature with some manual adjustments. In this paper, we extend the auditory-based feature extraction algorithm and propose to further train the auditory-based filters through discriminative training. Using the data-driven approach, we optimize the filters by minimizing the subsequent recognition errors on a task. One significant contribution over similar efforts in the past (generally under the name of "discriminative feature extraction") is that we make no assumption on the parametric form of the auditory-based filters. Instead, we only require the filters to be triangular-like: the filter weights have a maximum value in the middle and then monotonically decrease to both ends. Discriminative training of these constrained auditory-based filters leads to improved performance. Furthermore, we study the combined discriminative training procedure for both feature and acoustic model parameters. Our experiments show that the best performance can be obtained in a sequential procedure under the unified framework of MCE/GPD.
Keywords
acoustic filters; feature extraction; hearing; optimisation; parameter estimation; speech recognition; auditory-based filters; discriminative auditory-based features; discriminative feature extraction; discriminative training; feature extraction algorithm; generalized probabilistic descent; human peripheral auditory process; minimum classification error; noisy environment; psychoacoustics; recognition errors; robust speech recognition; Automatic speech recognition; Ear; Feature extraction; Filters; Hidden Markov models; Mathematical model; Psychoacoustic models; Robustness; Speech recognition; Working environment noise;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/TSA.2003.819951
Filename
1261269
Link To Document