Dynamic Features in the Linear-Logarithmic Hybrid Domain for Automatic Speech Recognition in a Reverberant Environment

Author

Ichikawa, Osamu ; Fukuda, Takashi ; Nishimura, Masafumi

Author_Institution

IBM Res. - Tokyo, Yamato, Japan

Volume

4

Issue

5

fYear

2010

Firstpage

816

Lastpage

823

Abstract

Static and dynamic features using Mel frequency cepstral coefficients (MFCCs) are widely used in automatic speech recognition. Since the MFCCs are calculated from logarithmic spectra, the delta and delta-delta are considered to be difference operations in the logarithmic domain. In a reverberant environment, speech signals have late reverberations, whose power is plotted as a long-term exponential decay. This tends to cause the logarithmic delta to keep the constant value for a long time. This paper considers new schemes for calculating delta and delta-delta features that quickly diminish in the reverberant segments. Experiments using the evaluation framework for reverberant environments (CENSREC-4) showed significant improvements by simply replacing the MFCC dynamic features with the proposed dynamic features.

Keywords

cepstral analysis; reverberation; speech recognition; CENSREC-4; MFCC; Mel frequency cepstral coefficients; automatic speech recognition; linear logarithmic hybrid domain; logarithmic delta-delta speech features; speech signal reverberations; Automatic speech recognition; Cepstral analysis; Discrete cosine transforms; Hidden Markov models; Mel frequency cepstral coefficient; Microphone arrays; Noise cancellation; Reverberation; Robustness; Transfer functions; Delta; Mel frequency cepstral coefficient (MFCC); dynamic feature; feature extraction; reverberation; robustness; speech recognition;

fLanguage

English

Journal_Title

Selected Topics in Signal Processing, IEEE Journal of

Publisher

ieee

ISSN

1932-4553

Type

jour

DOI

10.1109/JSTSP.2010.2057191

Filename

5508342