DocumentCode
2180494
Title
Performance of connected digit recognizers with context-dependent word duration modeling
Author
Kwon, Oh Wook ; Un, Chong Kwan
Author_Institution
Spoken Language Processing Sect., ETRI, Taejon, South Korea
fYear
1996
fDate
18-21 Nov 1996
Firstpage
243
Lastpage
246
Abstract
In a Korean connected digit recognizer, insertion and deletion errors amount to about half of the total recognition errors because there exists two monophonemic digits in the Korean language. Previous studies showed that these errors are not corrected even by discriminative training algorithms. To reduce those errors, we propose to model and incorporate context-dependent word duration information directly in a decoding algorithm. Experimental results show that while incorporating duration information in the postprocessing stage does not achieve significant improvements over a baseline system, the proposed method reduces word error rates by as much as 10% for unknown length decoding when the recognizer is trained by the maximum likelihood estimation and generalized probabilistic descent methods. Further simple duration modeling by a bounded uniform distribution shows it is possible to achieve performance improvements comparable to detailed duration modeling by a gamma or Gaussian distribution, and hence it is a good compromise between performance and complexity
Keywords
Gaussian distribution; decoding; errors; gamma distribution; maximum likelihood estimation; probability; speech coding; speech recognition; Gaussian distribution; Korean language; bounded uniform distribution; connected digit recognizers; context-dependent word duration modeling; decoding algorithm; deletion errors; duration information; gamma distribution; generalized probabilistic descent method; insertion errors; maximum likelihood estimation; monophonemic digits; postprocessing stage; recognition errors; word error rates; Context modeling; Error analysis; Error correction; Gaussian distribution; Hidden Markov models; Maximum likelihood decoding; Natural languages; Pattern recognition; Probability distribution; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 1996., IEEE Asia Pacific Conference on
Conference_Location
Seoul
Print_ISBN
0-7803-3702-6
Type
conf
DOI
10.1109/APCAS.1996.569264
Filename
569264
Link To Document