Improving phoneme and accent estimation by leveraging a dictionary for a stochastic TTS front-end

Author

Nagano, Tohru ; Tachibana, Ryuki ; Itoh, Nobuyasu ; Nishimura, Masafumi

Author_Institution

Tokyo Res. Lab., IBM Res., Yamato

fYear

2008

fDate

March 31 2008-April 4 2008

Firstpage

4689

Lastpage

4692

Abstract

Determining the correct phonemes and pitch accents is important for creating natural Japanese speech. We implemented a TTS front-end system based on an n-gram model. However, the vocabulary of the word n-gram model is limited to the list of the words found in the training corpus, and collecting a very large training corpus is not an easy task. In this paper, we propose using an additional class n-gram model to incorporate not only the words found in the training corpus, but the words found in the dictionary to further improve the accuracy. In our experiments, our proposed model relatively improves the accuracy for estimating accents by 16.9% and the accuracy for estimating phonemes by 21.6% compared to the word n-gram model.

Keywords

dictionaries; natural language processing; speech processing; stochastic processes; TTS front- end system; accent estimation; dictionary; natural Japanese speech; phonemes; pitch accents; stochastic TTS front-end; training corpus; vocabulary; word n-gram model; Context modeling; Dictionaries; Laboratories; Natural languages; Predictive models; Scalability; Speech synthesis; Stochastic processes; Tagging; Vocabulary; Interpolated LM; Japanese accent; Speech synthesis; TTS front-end; Word clustering;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location

Las Vegas, NV

ISSN

1520-6149

Print_ISBN

978-1-4244-1483-3

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2008.4518703

Filename

4518703