Title :
MAM: A multi-accent mandarin corpus
Author :
Huang, Chung-Hsiang ; Chen, Chia-Ping
Author_Institution :
Dept. of Comput. Sci. & Eng., Nat. Sung Yat-Sen Univ., Kaohsiung, Taiwan
Abstract :
This paper describes the design, the collection, and the labeling of a Multi-Accent Mandarin corpus called MAM. MAM is constructed primarily for automatic Chinese accent detection. We aim to study the idiosyncrasies of the Chinese accents of people from different regions of the world. The text in the corpus is designed to be easily readable for international Chinese-speaking students studying in Taiwan, while maintaining sufficient linguistic complexity. The speech data in MAM is collected from overseas students currently enrolled in the National Sun Yat-sen University. There are 30 speakers with 40 utterances each, so a total number of 1,200 utterances are collected. The recruited speakers are geographically divided into three groups: 10 speakers (6 male and 4 female) are from Indonesia, 10 speakers (5 and 5) are from Malaysia, and 10 speakers (5 and 5) are from Hong Kong or Macau. In addition, we have selected 10 speakers (5 and 5) from Taiwan in the TCC-300 corpus. The utterances in MAM are manually analyzed for actual pronunciation, and a rudimentary accent classifier based on the pronunciation variation patterns across different geographic areas is constructed.
Keywords :
natural language processing; speech processing; MAM; National Sun Yat-sen University; TCC-300 corpus; automatic Chinese accent detection; international Chinese-speaking students; linguistic complexity; multiaccent Mandarin corpus; pronunciation variation patterns; rudimentary accent classifier; Bismuth; accent detection; multi-accent Mandarin corpus;
Conference_Titel :
Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on
Conference_Location :
Hsinchu
Print_ISBN :
978-1-4577-0930-2
DOI :
10.1109/ICSDA.2011.6085997