A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort

Author

Seyed Omid Sadjadi;Hynek Bořil;John H.L. Hansen

Author_Institution

Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, USA

fYear

2012

Firstpage

4701

Lastpage

4704

Abstract

Automatic speech recognition is known to deteriorate in the presence of room reverberation and variation of vocal effort in speakers. This study considers robustness of several state-of-the-art front-end feature extraction and normalization strategies to these sources of speech signal variability in the context of large vocabulary continuous speech recognition (LVCSR). A speech database recorded in an anechoic room, capturing modal speech and speech produced at different levels of vocal effort, is reverberated using measured room impulse responses and utilized in the evaluations. It is shown that the combination of recently introduced mean Hilbert envelope coefficients (MHEC) and a normalization strategy combining cepstral gain normalization and modified RASTA filtering (CGN RASTALP) provides considerable recognition performance gains for reverberant modal and high vocal effort speech.

Keywords

"Speech","Reverberation","Speech recognition","Mel frequency cepstral coefficient","Robustness","Feature extraction"

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Type

conf

DOI

10.1109/ICASSP.2012.6288968

Filename

6288968