DocumentCode :
3530339
Title :
Modelling the prepausal lengthening effect for speech recognition: a dynamic Bayesian network approach
Author :
Ma, Ning ; Bartels, Chris D. ; Bilmes, Jeff A. ; Green, Phil D.
Author_Institution :
Dept. of Comput. Sci., Univ. of Sheffield, Sheffield
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
4617
Lastpage :
4620
Abstract :
Speech has a property that the speech unit preceding a speech pause tends to lengthen. This work presents the use of a dynamic Bayesian network to model the prepausal lengthening effect for robust speech recognition. Specifically, we introduce two distributions to model inter-state transitions in prepausal and non-prepausal words, respectively. The selection of the transition distributions depends on a random variable whose value is influenced by whether a pause will appear between the current and the following word. Two experiments are presented here. The first one considers pauses hypothesised during speech decoding. The second one employs an extra component for speech/non-speech determination. By modelling the prepausal lengthening effect we achieve a 5.5% relative reduction in word error rate on the 500-word task of the SVitchboard corpus.
Keywords :
belief networks; speech recognition; Bayesian network; prepausal lengthening effect; prosody; speech recognition; Automatic speech recognition; Bayesian methods; Computer science; Decoding; Error analysis; Mel frequency cepstral coefficient; Noise robustness; Random variables; Speech analysis; Speech recognition; Prepausal lengthening; duration; dynamic Bayesian networks; prosody; robust speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960659
Filename :
4960659
Link To Document :
بازگشت