Grapheme-to-phoneme conversion methods for minority language conditions

Author

Mengxue Cao ; Renals, Steve ; Bell, P. ; Aijun Li ; Qiang Fang

Author_Institution

Phonetics Lab., Chinese Acad. of Social Sci., Beijing, China

fYear

2012

fDate

9-12 Dec. 2012

Firstpage

151

Lastpage

156

Abstract

This study attempts to investigate the grapheme-to-phoneme conversion approaches for minority language conditions. Instead of isolated-word data for major languages, sentence-form data is defined to be a proper form of training data for minority languages. Joint-multigram Model and Hidden Markov Model were examined in this study. The “treat-sentence-as-word” training method and the forced-alignment process were proposed to extend the Joint-multigram Model and the Hidden Markov Model respectively to meet the minority language conditions. Results get from the sentence-form training data using our proposed methods are as good as the results get from the isolated-word training data using previous proposed methods. The Joint-multigram Model performs better for well-designed training data, while the Hidden Markov Model has more error capacity and is more proper for minority language conditions.

Keywords

hidden Markov models; natural language processing; speech processing; speech recognition; speech synthesis; word processing; error capacity; forced-alignment process; grapheme-to-phoneme conversion methods; hidden Markov model; joint-multigram model; minority language conditions; sentence-form training data; treat-sentence-as-word training method; Context modeling; Data models; Hidden Markov models; Speech; Speech recognition; Training; Training data; Grapheme-to-phoneme; HMM; Joint-multigram Model; forced-alignment; treat-sentence-as-word;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on

Conference_Location

Macau

Print_ISBN

978-1-4673-2811-1

Electronic_ISBN

978-1-4673-2812-8

Type

conf

DOI

10.1109/ICSDA.2012.6422470

Filename

6422470