DocumentCode :
3713056
Title :
Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix
Author :
Chatchawarn Hansakunbuntheung;Sumonmas Thatphithakkul
Author_Institution :
Speech and Audio Technology Laboratory, National Electronics and Computer Technology Center (NECTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
fYear :
2015
Firstpage :
160
Lastpage :
165
Abstract :
Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme pairs with their context information for G2P assessment. The context information includes 1) Categorial Matrix for representing orthographic types and usage domains of orthographic groups (OG). Categorial Matrix is designed to investigate problem categories in the G2P. 2) regular-expression-based flexible context for representing context variation. 3) OG Classes for representing interchangeable OGs in the flexible context. The flexible context and the word classes are designed to remove redundant contexts while covering context variation with minimal sets of patterns. By using the proposed corpus, automatic context generation for G2P evaluation can be implemented.
Keywords :
"Context","Syntactics","Tagging","Speech","Dictionaries"
Publisher :
ieee
Conference_Titel :
Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference
Type :
conf
DOI :
10.1109/ICSDA.2015.7357884
Filename :
7357884
Link To Document :
بازگشت