A Comparative Study of Diverse Knowledge Sources and Smoothing Techniques via Maximum Entropy for Polyphone Disambiguation in Mandarin TTS Systems

Author

MAO, Xinnian ; Dong, Yuan ; Han, Jinyu ; Wang, Haila

Author_Institution

France Telecom R&D Center, Beijing

fYear

2007

fDate

Aug. 30 2007-Sept. 1 2007

Firstpage

162

Lastpage

169

Abstract

This paper comparatively evaluated various knowledge sources and smoothing algorithms for pronunciation disambiguation in Mandarin TTS (text-to-speech) systems under maximum entropy (maxent) framework. In particular, five kinds of knowledge sources, namely characters and their pronunciations, words, their pronunciations and part-of-speech. together with two smoothing algorithms, i.e. Gaussian prior and inequality were compared. In our experiments conducted on 107 key Chinese polyphones. we found that all the knowledge sources almost perform equally well given the same smoothing measure, but the character-based features compare favorably because they are language independent and can be obtained with the lowest computation cost. Compared with the widely-used Gaussian smoothing, the equality smoothing greatly reduces the number of active features and yields a slightly improved accuracy on each knowledge source. Our best result (96.36%) is achieved by using character-based features together with the inequality smoothing, significantly superior to 81.22% by selecting the most frequent pronunciations and 88.72% by dictionary look-up with the part-of-speech. We also compared the maxent classifier with the transform-based error-driven learning algorithm (E. Brill, 1995) using the same knowledge sources, the results show that the maxent classifier achieve better performance to solve the polyphone disambiguation.

Keywords

Gaussian processes; maximum entropy methods; smoothing methods; speech processing; Chinese polyphones; Gaussian prior algorithm; Mandarin TTS systems; character-based features; diverse knowledge sources; inequality algorithm; maxent classifier; maxent framework; maximum entropy; polyphone disambiguation; pronunciation disambiguation; smoothing algorithms; smoothing techniques; text-to-speech system; transform-based error-driven learning algorithm; Entropy; Impedance matching; Information analysis; Natural languages; Research and development; Smoothing methods; Speech synthesis; Telecommunications; Text analysis; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-1611-0

Electronic_ISBN

978-1-4244-1611-0

Type

conf

DOI

10.1109/NLPKE.2007.4368028

Filename

4368028