DocumentCode
1533198
Title
Dynamic Features in the Linear-Logarithmic Hybrid Domain for Automatic Speech Recognition in a Reverberant Environment
Author
Ichikawa, Osamu ; Fukuda, Takashi ; Nishimura, Masafumi
Author_Institution
IBM Res. - Tokyo, Yamato, Japan
Volume
4
Issue
5
fYear
2010
Firstpage
816
Lastpage
823
Abstract
Static and dynamic features using Mel frequency cepstral coefficients (MFCCs) are widely used in automatic speech recognition. Since the MFCCs are calculated from logarithmic spectra, the delta and delta-delta are considered to be difference operations in the logarithmic domain. In a reverberant environment, speech signals have late reverberations, whose power is plotted as a long-term exponential decay. This tends to cause the logarithmic delta to keep the constant value for a long time. This paper considers new schemes for calculating delta and delta-delta features that quickly diminish in the reverberant segments. Experiments using the evaluation framework for reverberant environments (CENSREC-4) showed significant improvements by simply replacing the MFCC dynamic features with the proposed dynamic features.
Keywords
cepstral analysis; reverberation; speech recognition; CENSREC-4; MFCC; Mel frequency cepstral coefficients; automatic speech recognition; linear logarithmic hybrid domain; logarithmic delta-delta speech features; speech signal reverberations; Automatic speech recognition; Cepstral analysis; Discrete cosine transforms; Hidden Markov models; Mel frequency cepstral coefficient; Microphone arrays; Noise cancellation; Reverberation; Robustness; Transfer functions; Delta; Mel frequency cepstral coefficient (MFCC); dynamic feature; feature extraction; reverberation; robustness; speech recognition;
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2010.2057191
Filename
5508342
Link To Document