Title :
USTC95-a Putonghua corpus
Author :
Wang, Ren-Hua ; Xia, Deyu ; Ni, Jinfu ; Liu, Bicheng
Author_Institution :
Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
For the standard spoken Chinese dialect commonly known as Putonghua or Mandarin, a large corpus called USTC95 (University of Science and Technology of China, ´95) is introduced, which is primarily designed to support research in Chinese speech recognition and analysis and in recognition system evaluation. This corpus consists of four major sub-corpora, corresponding to isolated syllables, multi-syllable words, sentences and telephone speech. With an elaborate design, the corpus encompasses all the phones and mono-syllables, as well as the co-articulation effects in Putonghua; also, it keeps as little redundancy as possible. This parsimonious corpus makes it possible to acquire acoustic-phonetic knowledge for isolated word recognition and continuous Chinese recognition, to provide speech data for training a telephone speech recognizer, and also to provide a common test base for the performance assessment of the recognizer
Keywords :
database management systems; languages; speech recognition; Chinese speech recognition; Mandarin Chinese; Putonghua corpus; USTC95; acoustic-phonetic knowledge acquisition; coarticulation effects; continuous speech recognition; isolated syllables; isolated word recognition; mono-syllables; multi-syllable words; parsimonious corpus; performance assessment; phones; redundancy; sentences; speech recognition system evaluation; sub-corpora; telephone speech; training; Acoustic testing; Isolation technology; Kernel; Natural languages; Speech analysis; Speech processing; Speech recognition; System testing; Target recognition; Telephony;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.608003