DocumentCode
1909595
Title
A Comparative Study of Diverse Knowledge Sources and Smoothing Techniques via Maximum Entropy for Polyphone Disambiguation in Mandarin TTS Systems
Author
MAO, Xinnian ; Dong, Yuan ; Han, Jinyu ; Wang, Haila
Author_Institution
France Telecom R&D Center, Beijing
fYear
2007
fDate
Aug. 30 2007-Sept. 1 2007
Firstpage
162
Lastpage
169
Abstract
This paper comparatively evaluated various knowledge sources and smoothing algorithms for pronunciation disambiguation in Mandarin TTS (text-to-speech) systems under maximum entropy (maxent) framework. In particular, five kinds of knowledge sources, namely characters and their pronunciations, words, their pronunciations and part-of-speech. together with two smoothing algorithms, i.e. Gaussian prior and inequality were compared. In our experiments conducted on 107 key Chinese polyphones. we found that all the knowledge sources almost perform equally well given the same smoothing measure, but the character-based features compare favorably because they are language independent and can be obtained with the lowest computation cost. Compared with the widely-used Gaussian smoothing, the equality smoothing greatly reduces the number of active features and yields a slightly improved accuracy on each knowledge source. Our best result (96.36%) is achieved by using character-based features together with the inequality smoothing, significantly superior to 81.22% by selecting the most frequent pronunciations and 88.72% by dictionary look-up with the part-of-speech. We also compared the maxent classifier with the transform-based error-driven learning algorithm (E. Brill, 1995) using the same knowledge sources, the results show that the maxent classifier achieve better performance to solve the polyphone disambiguation.
Keywords
Gaussian processes; maximum entropy methods; smoothing methods; speech processing; Chinese polyphones; Gaussian prior algorithm; Mandarin TTS systems; character-based features; diverse knowledge sources; inequality algorithm; maxent classifier; maxent framework; maximum entropy; polyphone disambiguation; pronunciation disambiguation; smoothing algorithms; smoothing techniques; text-to-speech system; transform-based error-driven learning algorithm; Entropy; Impedance matching; Information analysis; Natural languages; Research and development; Smoothing methods; Speech synthesis; Telecommunications; Text analysis; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-1611-0
Electronic_ISBN
978-1-4244-1611-0
Type
conf
DOI
10.1109/NLPKE.2007.4368028
Filename
4368028
Link To Document