DocumentCode :
2309074
Title :
Speech Recognition Using Syllable Duration Ratio Model
Author :
Ariu, Masahide ; Masuko, Takashi ; Tanaka, Shinichi ; Kawamura, Akinori
Author_Institution :
Corp. Res. & Dev. Center, Toshiba Corp.
Volume :
1
fYear :
2006
fDate :
14-19 May 2006
Abstract :
This paper describes a novel approach to duration information modeling for speech recognition. To eliminate the influence of speaking rate on the duration model, we propose a model utilizing the duration ratios of two successive syllables by log-normal distributions. We refer to this model as a syllable duration ratio model (SDRM), and compare it with a syllable duration model (SDM) that represents the duration of the syllable itself. These duration models are compared in isolated word and connected digit recognition tasks under noisy conditions. Experimental results show that the SDRM outperformed the SDM, and reduced the errors by approximately 30% compared to the baseline system without duration model at 15 dB or higher SNR in 10 digits recognition tasks. In addition, we show that the SDRM is robust with respect to the difference in speaking rate between training and test data
Keywords :
log normal distribution; speech recognition; SNR; digit recognition tasks; log-normal distributions; speech recognition; syllable duration ratio model; Degradation; Hidden Markov models; Linear regression; Log-normal distribution; Rhythm; Robustness; Speech recognition; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
ISSN :
1520-6149
Print_ISBN :
1-4244-0469-X
Type :
conf
DOI :
10.1109/ICASSP.2006.1660027
Filename :
1660027
Link To Document :
بازگشت