Selecting static and dynamic features using an advanced auditory model for speech recognition

Author

Koniaris, Christos ; Chatterjee, Saikat ; Kleijn, W. Bastiaan

Author_Institution

Sound & Image Process. Lab., KTH - R. Inst. of Technol., Stockholm, Sweden

fYear

2010

fDate

14-19 March 2010

Firstpage

4342

Lastpage

4345

Abstract

We describe a method to select features for speech recognition that is based on a quantitative model of the human auditory periphery. The method maximizes the similarity of the geometry of the space spanned by the subset of features and the geometry of the space spanned by the auditory model output. The selection method uses a spectro-temporal auditory model that captures both frequency- and time-domain masking. The selection method is blind to the meaning of speech and does not require annotated speech data. We apply the method to the selection of a subset of features from a conventional set consisting of mel cepstra and their first-order and second-order time derivatives. Although our method uses only knowledge of the human auditory periphery, the experimental results show that it performs significantly better than feature-reduction algorithms based on linear and heteroscedastic discriminant analysis that require training with annotated speech data.

Keywords

cepstral analysis; feature extraction; speech intelligibility; speech recognition; dynamic feature; feature selection; first-order time derivatives; frequency-domain masking; geometry; human auditory periphery; mel cepstra; quantitative model; second-order time derivatives; spectro-temporal auditory model; speech recognition; static feature; time-domain masking; Algorithm design and analysis; Cepstral analysis; Frequency; Geometry; Humans; Performance analysis; Solid modeling; Speech analysis; Speech recognition; Time domain analysis; auditory model; dimension reduction; distortion; feature selection; perception; sensitivity analysis; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5495648

Filename

5495648