Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition

Author

Matsumoto, Harosha ; Moroto, Masanori

Author_Institution

Fac. of Eng., Shinshu Univ., Nagano, Japan

Volume

1

fYear

2001

fDate

2001

Firstpage

117

Abstract

This paper presents a simple and efficient time domain technique to estimate an all-pole model on the mel-frequency scale (mel-LPC), and compares the recognition performance of the mel-LPC cepstrum with those of both the standard LPC mel-cepstrum and the MFCC (mel-frequency cepstral coefficient) through the Japanese dictation system (Julius) with 20,000 word vocabulary. First, the optimal value of the frequency warping factor is examined in terms of monosyllable accuracy. When using the optimal warping factors, the mel-LPC cepstrum attains word accuracies of 93.0% for male speakers and 93.1% for female speakers, which are 2.1% and 1.7% higher than those of the LPC mel-cepstrum, respectively. Furthermore, this performance is slightly superior to that of MFCC

Keywords

cepstral analysis; linear predictive coding; speech coding; speech recognition; time-domain analysis; Japanese dictation system; Julius; MFCC; all-pole model; female speakers; frequency warping factor; large vocabulary continuous speech recognition; male speakers; mel frequency scale; mel-LPC cepstrum; monosyllable accuracy; recognition performance; time domain technique; word accuracies; Automatic speech recognition; Cepstral analysis; Cepstrum; Frequency conversion; Linear predictive coding; Mel frequency cepstral coefficient; Psychoacoustic models; Spectral analysis; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on

Conference_Location

Salt Lake City, UT

ISSN

1520-6149

Print_ISBN

0-7803-7041-4

Type

conf

DOI

10.1109/ICASSP.2001.940781

Filename

940781