DocumentCode
3340299
Title
Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition
Author
Matsumoto, Harosha ; Moroto, Masanori
Author_Institution
Fac. of Eng., Shinshu Univ., Nagano, Japan
Volume
1
fYear
2001
fDate
2001
Firstpage
117
Abstract
This paper presents a simple and efficient time domain technique to estimate an all-pole model on the mel-frequency scale (mel-LPC), and compares the recognition performance of the mel-LPC cepstrum with those of both the standard LPC mel-cepstrum and the MFCC (mel-frequency cepstral coefficient) through the Japanese dictation system (Julius) with 20,000 word vocabulary. First, the optimal value of the frequency warping factor is examined in terms of monosyllable accuracy. When using the optimal warping factors, the mel-LPC cepstrum attains word accuracies of 93.0% for male speakers and 93.1% for female speakers, which are 2.1% and 1.7% higher than those of the LPC mel-cepstrum, respectively. Furthermore, this performance is slightly superior to that of MFCC
Keywords
cepstral analysis; linear predictive coding; speech coding; speech recognition; time-domain analysis; Japanese dictation system; Julius; MFCC; all-pole model; female speakers; frequency warping factor; large vocabulary continuous speech recognition; male speakers; mel frequency scale; mel-LPC cepstrum; monosyllable accuracy; recognition performance; time domain technique; word accuracies; Automatic speech recognition; Cepstral analysis; Cepstrum; Frequency conversion; Linear predictive coding; Mel frequency cepstral coefficient; Psychoacoustic models; Spectral analysis; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
Conference_Location
Salt Lake City, UT
ISSN
1520-6149
Print_ISBN
0-7803-7041-4
Type
conf
DOI
10.1109/ICASSP.2001.940781
Filename
940781
Link To Document