DocumentCode
3422357
Title
Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer
Author
Delcroix, Marc ; Nakatani, Tomohiro ; Watanabe, Shinji
Author_Institution
NTT Commun. Sci. Labs., NTT Corp., Kyoto
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
4073
Lastpage
4076
Abstract
It is well known that automatic speech recognition performs poorly in presence of noise or reverberation. Much research has been undertaken on model adaptation and speech enhancement to increase the robustness of speech recognizers. Model adaptation is effective to remove static mismatch between speech features and acoustic model parameters, but may not cope well with dynamic mismatch. Speech enhancement approaches can reduce dynamic perturbations, but often do not interconnect well with speech recognizer. There seems to be a lack of optimal way to combine these two approaches. In this paper we propose introducing the dynamic capabilities of speech enhancement into a static adaptation scheme. We focus on variance adaptation, and propose a novel parametric variance model that includes static and dynamic components. The dynamic component is derived from a speech enhancement pre-process, and the parameters of the model are optimized using an adaptive training scheme. An evaluation of the method with a speech dereverberation for preprocessing revealed that a 80 % relative error rate reduction was possible compared with the recognition of dereverberated speech, and the final error rate was 5.4 % which is close to that of clean speech (1.2%).
Keywords
speech enhancement; speech recognition; adaptive training scheme; automatic speech recognition; dynamic variance adaptation; speech dereverberation; speech enhancement preprocessor; Acoustic noise; Adaptation model; Automatic speech recognition; Error analysis; Maximum likelihood linear regression; Noise robustness; Reverberation; Speech analysis; Speech enhancement; Speech recognition; Dereverberation; Model adaptation; Robust ASR; Variance compensation;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV
ISSN
1520-6149
Print_ISBN
978-1-4244-1483-3
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2008.4518549
Filename
4518549
Link To Document