Abstract :
In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) filtering directly in the cepstral domain. We call this technique mean subtraction, variance normalization, and ARMA filtering (MVA) post-processing, and speech features with MVA post-processing are called MVA features. Overall, compared to raw features without post-processing, MVA features achieve an error rate reduction of 45% on matched tasks and 65% on mismatched tasks on the Aurora 2.0 noisy speech database, and an average 57% error reduction on the Aurora 3.0 database. These improvements are comparable to the results of much more complicated techniques even though MVA is relatively simple and requires practically no additional computational cost. In this paper, in addition to describing MVA processing, we also present a novel analysis of the distortion of mel-frequency cepstral coefficients and the log energy in the presence of different types of noise. The effectiveness of MVA is extensively investigated with respect to several variations: the configurations used to extract and the type of raw features, the domains where MVA is applied, the filters that are used, the ARMA filter orders, and the causality of the normalization process. Specifically, it is argued and demonstrated that MVA works better when applied to the zeroth-order cepstral coefficient than to log energy, that MVA works better in the cepstral domain, that an ARMA filter is better than either a designed finite impulse response filter or a data-driven filter, and that a five-tap ARMA filter is sufficient to achieve good performance in a variety of settings. We also investigate and evaluate a multi-domain MVA generalization
Keywords :
FIR filters; autoregressive moving average processes; filtering theory; speech processing; ARMA filtering; MVA processing; auto-regression moving-average filtering; cepstral domain; data-driven filter; finite impulse response filter; mel-frequency cepstral coefficients; speech features; time sequence filtering; variance normalization; zeroth-order cepstral coefficient; Automatic speech recognition; Cepstral analysis; Filtering; Finite impulse response filter; Hidden Markov models; Mel frequency cepstral coefficient; Noise robustness; Spatial databases; Speech processing; Training data; ARMA filter; Aurora 2.0; Aurora 3.0; MFCC; RASTA; feature extraction; front end processing; mean subtraction; noise robustness; speech recognition; temporal smoothing; variance normalization;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on