مرکز منطقه ای اطلاع رساني علوم و فناوري - An efficient algorithm to select phonetically balanced scripts for constructing a speech corpus

DocumentCode :

2665285

Title :

An efficient algorithm to select phonetically balanced scripts for constructing a speech corpus

Author :

Liang, Min-song ; Lyu, Ren-Yuan ; Chiang, Yuang-chin

Author_Institution :

Dept. of Electr. Eng., Chang Gung Univ., Taoyuan, China

fYear :

2003

fDate :

26-29 Oct. 2003

Firstpage :

433

Lastpage :

437

Abstract :

Here, we describe an efficient algorithm to select phonetically balanced scripts for collecting a large-scale multilingual speech corpus. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, the first step is to construct a multilingual phonetic alphabet, namely Formosa phonetic alphabet (ForPA). In addition, the multilingual lexicons (Fomosa lexicons) are also important parts for building the corpus. Until now, this corpus containing 600 speaker´s speech of Taiwanese (Min-nan) and Mandarin Chinese has been finished and ready to release. There contains about 40 hours of speech in 247 thousand utterances in this release.

Keywords :

audio databases; linguistics; natural languages; speech processing; word processing; Fomosa lexicon; Formosa phonetic alphabet; multilingual lexicon; multilingual phonetic alphabet; multilingual speech corpus; phonetically-balanced word; pronunciation lexicon; Buildings; Computer science; Databases; Large-scale systems; Natural languages; Research and development; Speech processing; Speech recognition; Speech synthesis; Statistics;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on

Conference_Location :

Beijing, China

Print_ISBN :

0-7803-7902-0

Type :

conf

DOI :

10.1109/NLPKE.2003.1275945

Filename :

1275945

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2665285