مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech Recognition Using Syllable Duration Ratio Model

DocumentCode :

2309074

Title :

Speech Recognition Using Syllable Duration Ratio Model

Author :

Ariu, Masahide ; Masuko, Takashi ; Tanaka, Shinichi ; Kawamura, Akinori

Author_Institution :

Corp. Res. & Dev. Center, Toshiba Corp.

Volume :

fYear :

2006

fDate :

14-19 May 2006

Abstract :

This paper describes a novel approach to duration information modeling for speech recognition. To eliminate the influence of speaking rate on the duration model, we propose a model utilizing the duration ratios of two successive syllables by log-normal distributions. We refer to this model as a syllable duration ratio model (SDRM), and compare it with a syllable duration model (SDM) that represents the duration of the syllable itself. These duration models are compared in isolated word and connected digit recognition tasks under noisy conditions. Experimental results show that the SDRM outperformed the SDM, and reduced the errors by approximately 30% compared to the baseline system without duration model at 15 dB or higher SNR in 10 digits recognition tasks. In addition, we show that the SDRM is robust with respect to the difference in speaking rate between training and test data

Keywords :

log normal distribution; speech recognition; SNR; digit recognition tasks; log-normal distributions; speech recognition; syllable duration ratio model; Degradation; Hidden Markov models; Linear regression; Log-normal distribution; Rhythm; Robustness; Speech recognition; Testing; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location :

Toulouse

ISSN :

1520-6149

Print_ISBN :

1-4244-0469-X

Type :

conf

DOI :

10.1109/ICASSP.2006.1660027

Filename :

1660027

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2309074