DocumentCode :
2541948
Title :
Improved Large Vocabulary Mandarin Speech Recognition Using Prosodic and Lexical Information in Maximum Entropy Framework
Author :
Ni, Chongjia ; Liu, Wenju ; Xu, Bo
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fYear :
2009
fDate :
4-6 Nov. 2009
Firstpage :
1
Lastpage :
4
Abstract :
Tone plays an important role in distinguishing ambiguous words in Chinese Mandarin speech recognition. In this paper, we make full use of pitch information. On the one hand, we interpolate F0 contour to make the F0 contour continuous between voiced and unvoiced segments in order to embed F0 into speech recognition system in two streams, which cepstrum and its first and second order derivatives constitute one stream , and F0 and its first and second order derivatives make up the other stream; On the other hand, we use prosodic and lexical features, as well as syllable context information under maximum entropy framework to build explicit tone modeling in rescoring the first-pass outputting lattice. Experimental results show that pitch information and the tonal cues can reduce substitution error greatly and achieve a 3.65% absolute Chinese character error rate (CER) reduction on widely used Mandarin speech recognition tasks-863 test.
Keywords :
maximum entropy methods; natural languages; speech recognition; Chinese character error rate reduction; Mandarin speech recognition; lexical information; maximum entropy framework; pitch information; prosodic information; syllable context information; tone modeling; Automatic speech recognition; Automation; Cepstrum; Context modeling; Entropy; Laboratories; Lattices; Pattern recognition; Speech recognition; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2009. CCPR 2009. Chinese Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4199-0
Type :
conf
DOI :
10.1109/CCPR.2009.5344045
Filename :
5344045
Link To Document :
بازگشت