Title :
Comparison of two tree-structured approaches for grapheme-to-phoneme conversion
Author :
Andersen, Ove ; Kuhn, Roland ; Lazarides, Ariane ; Dalsgaard, Paul ; Haas, Jurgen ; Noth, Elmar
Author_Institution :
Center for PersonKommunikation, Aalborg Univ., Denmark
Abstract :
Recently, we described a two step self learning approach for grapheme to phoneme (G2P) conversion (O. Anderson and P. Dalsgaard, 1995). In the first step, grapheme and phoneme strings in the training data are aligned via an iterative Viterbi procedure that may insert graphemic and phonemic nulls where required. In the second step, a Trie structure, encoding pronunciation rules is generated. We describe the alignment module, and give alignment accuracies on the NETtalk database. We also compare transcription accuracies for two approaches to the second step on three databases: the NETtalk database, the CMU dictionary and the French part of the ONOMASTICA lexicon. The two transcription approaches applied in this research are a Trie approach and an approach based on binary decision trees grown by means of the Gelfand-Ravishankar-Delp algorithm (F. Breiman et al., 1984; S. Gelfand et al., 1991; R. Kuhn et al., 1995). We discuss the choice of questions for these decision trees-it may be possible to formulate questions about groups of characters (e.g., “is the next letter a vowel?”) that yield better trees than those that only use questions about individual characters (e.g., “is the next letter an `A´ ?”). Finally, we discuss the implications of our work for G2P conversion
Keywords :
decision theory; directed graphs; encoding; natural languages; speech recognition; speech synthesis; tree data structures; trees (mathematics); CMU dictionary; G2P conversion; Gelfand-Ravishankar-Delp algorithm; NETtalk database; ONOMASTICA lexicon; Trie structure; alignment accuracies; alignment module; binary decision trees; decision trees; grapheme to phoneme conversion; graphemic nulls; iterative Viterbi procedure; phoneme strings; phonemic nulls; pronunciation rule encoding; training data; transcription accuracies; transcription approaches; tree structured approaches; two step self learning approach; Databases; Decision trees; Dictionaries; Encoding; Knowledge based systems; Speech recognition; Speech synthesis; Training data; Tree graphs; Viterbi algorithm;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607954