Generalization of temporal filter and linear transformation for robust speech recognition

Author

Duc Hoang Ha Nguyen ; Xiong Xiao ; Eng Siong Chng ; Haizhou Li

Author_Institution

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore

fYear

2014

fDate

4-9 May 2014

Firstpage

1730

Lastpage

1734

Abstract

Temporal filtering of feature trajectories and linear transformation of feature vectors are two effective ways to compensate the speech features to achieve robust speech recognition in noisy and reverberant environments. In the previous studies, as the two methods are usually applied in sequence, the interaction between the two methods is not optimized. In this paper, we propose a generalized transform which integrates temporal filter and linear transformation into a single process. The new transform parameters are optimized to minimize an approximated Kullback-Leibler divergence between the distribution of the compensated features and the distribution represented by a clean reference model. The proposed method is evaluated on the Aurora-5 clean condition training task. The experiments show that the generalized transform significantly outperforms the simple cascade of temporal filtering and linear transformation. For example, the word accuracy is improved from 81.55% (cascade) to 83.99% (generalized) and from 72.09% (cascade) to 76.04% (generalized) for office and living room environments, respectively, in speaker based feature adaptation scheme.

Keywords

filtering theory; speech recognition; transforms; Aurora-5 clean condition training task; approximated Kullback-Leibler divergence; clean reference model; compensated feature distribution; feature trajectory; feature vectors; generalized transform; linear transformation; noisy environments; reverberant environments; robust speech recognition; speaker based feature adaptation scheme; temporal filter generalization; Acoustics; Robustness; Speech; Speech processing; Speech recognition; Transforms; Vectors; Kullback-Leibler divergence; Robust speech recognition; linear transformation; reverberant speech recognition; temporal filter;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6853894

Filename

6853894