Discriminative auditory-based features for robust speech recognition

Author

Mak, Brian Kan-Wing ; Tam, Yik-Cheung ; Li, Peter Qi

Author_Institution

Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., China

Volume

12

Issue

1

fYear

2004

Firstpage

27

Lastpage

36

Abstract

Recently, a new auditory-based feature extraction algorithm for robust speech recognition in noisy environments was proposed. The new features are derived by mimicking closely the human peripheral auditory process and the filters in the outer ear, middle ear, and inner ear are obtained from psychoacoustics literature with some manual adjustments. In this paper, we extend the auditory-based feature extraction algorithm and propose to further train the auditory-based filters through discriminative training. Using the data-driven approach, we optimize the filters by minimizing the subsequent recognition errors on a task. One significant contribution over similar efforts in the past (generally under the name of "discriminative feature extraction") is that we make no assumption on the parametric form of the auditory-based filters. Instead, we only require the filters to be triangular-like: the filter weights have a maximum value in the middle and then monotonically decrease to both ends. Discriminative training of these constrained auditory-based filters leads to improved performance. Furthermore, we study the combined discriminative training procedure for both feature and acoustic model parameters. Our experiments show that the best performance can be obtained in a sequential procedure under the unified framework of MCE/GPD.

Keywords

acoustic filters; feature extraction; hearing; optimisation; parameter estimation; speech recognition; auditory-based filters; discriminative auditory-based features; discriminative feature extraction; discriminative training; feature extraction algorithm; generalized probabilistic descent; human peripheral auditory process; minimum classification error; noisy environment; psychoacoustics; recognition errors; robust speech recognition; Automatic speech recognition; Ear; Feature extraction; Filters; Hidden Markov models; Mathematical model; Psychoacoustic models; Robustness; Speech recognition; Working environment noise;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/TSA.2003.819951

Filename

1261269