DocumentCode :
2665285
Title :
An efficient algorithm to select phonetically balanced scripts for constructing a speech corpus
Author :
Liang, Min-song ; Lyu, Ren-Yuan ; Chiang, Yuang-chin
Author_Institution :
Dept. of Electr. Eng., Chang Gung Univ., Taoyuan, China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
433
Lastpage :
437
Abstract :
Here, we describe an efficient algorithm to select phonetically balanced scripts for collecting a large-scale multilingual speech corpus. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, the first step is to construct a multilingual phonetic alphabet, namely Formosa phonetic alphabet (ForPA). In addition, the multilingual lexicons (Fomosa lexicons) are also important parts for building the corpus. Until now, this corpus containing 600 speaker´s speech of Taiwanese (Min-nan) and Mandarin Chinese has been finished and ready to release. There contains about 40 hours of speech in 247 thousand utterances in this release.
Keywords :
audio databases; linguistics; natural languages; speech processing; word processing; Fomosa lexicon; Formosa phonetic alphabet; multilingual lexicon; multilingual phonetic alphabet; multilingual speech corpus; phonetically-balanced word; pronunciation lexicon; Buildings; Computer science; Databases; Large-scale systems; Natural languages; Research and development; Speech processing; Speech recognition; Speech synthesis; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275945
Filename :
1275945
Link To Document :
بازگشت