DocumentCode :
417264
Title :
Phone duration modeling for LVCSR
Author :
Povey, D.
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
Modeling phone durations in a word-specific fashion has previously been shown to lead to improvements in LVCSR recognition performance. We report results on the Switchboard database which confirm that at least small improvements (around 0.2-0.3% absolute) can be obtained. The duration probabilities are applied to time-marked recognition lattices. Features of the system include a novel data-driven method for smoothing discrete distributions, and a form of discrete distribution which allows phone and word lengths to be modeled simultaneously within a consistent probabilistic framework.
Keywords :
Gaussian distribution; smoothing methods; speech coding; speech recognition; LVCSR; Switchboard database; data-driven method; discrete distribution smoothing; duration probabilities; phone duration modeling; probabilistic framework; speech recognition performance; time-marked recognition lattices; word lengths; word-specific fashion; Character generation; Chromium; Frequency; Gaussian processes; Hidden Markov models; Lattices; Probability distribution; Smoothing methods; Spatial databases; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326114
Filename :
1326114
Link To Document :
بازگشت