DocumentCode
3558656
Title
Normalization of the Speech Modulation Spectra for Robust Speech Recognition
Author
Xiao, Xiong ; Eng Siong Chng ; Li, Haizhou
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
Volume
16
Issue
8
fYear
2008
Firstpage
1662
Lastpage
1674
Abstract
In this paper, we study a novel technique that normalizes the modulation spectra of speech signals for robust speech recognition. The modulation spectra of a speech signal are the power spectral density (PSD) functions of the feature trajectories generated from the signal, hence they describe the temporal structure of the features. The modulation spectra are distorted when the speech signal is corrupted by noise. We propose the temporal structure normalization (TSN) filter to reduce the noise effects by normalizing the modulation spectra to reference spectra. The TSN filter is different from other feature normalization methods such as the histogram equalization (HEQ) that only normalize the probability distributions of the speech features. Our previous work showed promising results of TSN on a small vocabulary Aurora-2 task. In this paper, we conduct an inquiry into the theoretical and practical issues of the TSN filter that includes the following. 1) We investigate the effects of noises on the speech modulation spectra and show the general characteristics of noisy speech modulation spectra. The observations help to further explain and justify the TSN filter. 2) We evaluate the TSN filter on the Aurora-4 task and demonstrate its effectiveness for a large vocabulary task. 3) We propose a segment-based implementation of the TSN filter that reduces the processing delay significantly without affecting the performance. Overall, the TSN filter produces significant improvements over the baseline systems, and delivers competitive results when compared to other state-of-the-art temporal filters.
Keywords
modulation; speech recognition; statistical distributions; histogram equalization; power spectral density functions; probability distributions; robust speech recognition; speech features; speech modulation spectra; temporal structure normalization; Distortion; Filters; Histograms; Noise reduction; Noise robustness; Power generation; Signal generators; Speech enhancement; Speech recognition; Vocabulary; Aurora task; feature normalization; modulation spectrum; robust speech recognition; temporal filter;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2008.2002082
Filename
4648211
Link To Document