DocumentCode
3013400
Title
Distance measure for speech recognition based on the smoothed group delay spectrum
Author
Itakura, Fumitada ; Umezaki, Taizo
Author_Institution
Nagoya University, Nagoya-shi, Japan
Volume
12
fYear
1987
fDate
31868
Firstpage
1257
Lastpage
1260
Abstract
We present a novel spectral distance measure based on the smoothed LPC group delay spectrum which gives a stable recognition performance under variable frequency transfer characteristics and additive noise. The weight of the n-th cepstral coefficients in our measure is given by
which can be adjusted by selecting proper values of
and τ. In order to optimize the parameters of this distance measure, extensive experiments are carried out in a speaker-dependent isolated word recognition system using a standard dynamic time warping technique. The input speech data used here is a set of phonetically very similar 68 Japanese city name pairs spoken by male speakers. The experimental results show that our distance measure gives a robust recognition rate in spite of the variation in frequency characteristics and signal to noise ratio(SNR). In noisy situations of segmental SNR 20 dB, the recognition rate was more than 13% higher than that obtained by using the standard Euclidean cepstral distance measure. Finally, it is shown that the optimum value of
is approximately 1, and the optimum range of τΔT is about 1 ms.
which can be adjusted by selecting proper values of
and τ. In order to optimize the parameters of this distance measure, extensive experiments are carried out in a speaker-dependent isolated word recognition system using a standard dynamic time warping technique. The input speech data used here is a set of phonetically very similar 68 Japanese city name pairs spoken by male speakers. The experimental results show that our distance measure gives a robust recognition rate in spite of the variation in frequency characteristics and signal to noise ratio(SNR). In noisy situations of segmental SNR 20 dB, the recognition rate was more than 13% higher than that obtained by using the standard Euclidean cepstral distance measure. Finally, it is shown that the optimum value of
is approximately 1, and the optimum range of τΔT is about 1 ms.Keywords
Additive noise; Cepstral analysis; Character recognition; Delay; Frequency measurement; Linear predictive coding; Measurement standards; Noise measurement; Signal to noise ratio; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87.
Type
conf
DOI
10.1109/ICASSP.1987.1169476
Filename
1169476
Link To Document