مرکز منطقه ای اطلاع رساني علوم و فناوري - Robust speech recognition through selection of speaker and environment transforms

DocumentCode :

3163192

Title :

Robust speech recognition through selection of speaker and environment transforms

Author :

Bilgi, R. ; Joshi, Vinayak ; Umesh, S. ; Garcia, Luis ; Benitez, Carmen

Author_Institution :

Dept. of Electr. Eng., Indian Inst. of Technol., Madras, Chennai, India

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4333

Lastpage :

4336

Abstract :

In this paper, we address the problem of robustness to both noise and speaker-variability in automatic speech recognition (ASR). We propose the use of pre-computed Noise and Speaker transforms, and an optimal combination of these two transforms are chosen during test using maximum-likelihood (ML) criterion. These pre-computed transforms are obtained during training by using data obtained from different noise conditions that are usually encountered for that particular ASR task. The environment transforms are obtained during training using constrained-MLLR (CMLLR) framework, while for speaker-transforms we use the analytically determined linear-VTLN matrices. Even though the exact noise environment may not be encountered during test, the ML-based choice of the closest Environment transform provides “sufficient” cleaning and this is corroborated by experimental results with performance comparable to histogram equalization or Vector Taylor Series approaches on Aurora-2 task. The proposed method is simple since it involves only the choice of pre-computed environment and speaker transforms and therefore, can be applied with very little test data unlike many other speaker and noise-compensation methods.

Keywords :

maximum likelihood estimation; speech recognition; transforms; ASR; Aurora-2 task; CMLLR; automatic speech recognition; constrained MLLR; environment transforms; histogram equalization; linear VTLN matrices; maximum likelihood linear regression; noise compensation; noise conditions; robust speech recognition; speaker transforms; speaker variability; vector Taylor series approaches; Abstracts; Hidden Markov models; Noise; Noise measurement; Robustness; Transforms; environment adaptation; robustness; speaker adaptation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288878

Filename :

6288878

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3163192