MAM: A multi-accent mandarin corpus

Author

Huang, Chung-Hsiang ; Chen, Chia-Ping

Author_Institution

Dept. of Comput. Sci. & Eng., Nat. Sung Yat-Sen Univ., Kaohsiung, Taiwan

fYear

2011

fDate

26-28 Oct. 2011

Firstpage

147

Lastpage

151

Abstract

This paper describes the design, the collection, and the labeling of a Multi-Accent Mandarin corpus called MAM. MAM is constructed primarily for automatic Chinese accent detection. We aim to study the idiosyncrasies of the Chinese accents of people from different regions of the world. The text in the corpus is designed to be easily readable for international Chinese-speaking students studying in Taiwan, while maintaining sufficient linguistic complexity. The speech data in MAM is collected from overseas students currently enrolled in the National Sun Yat-sen University. There are 30 speakers with 40 utterances each, so a total number of 1,200 utterances are collected. The recruited speakers are geographically divided into three groups: 10 speakers (6 male and 4 female) are from Indonesia, 10 speakers (5 and 5) are from Malaysia, and 10 speakers (5 and 5) are from Hong Kong or Macau. In addition, we have selected 10 speakers (5 and 5) from Taiwan in the TCC-300 corpus. The utterances in MAM are manually analyzed for actual pronunciation, and a rudimentary accent classifier based on the pronunciation variation patterns across different geographic areas is constructed.

Keywords

natural language processing; speech processing; MAM; National Sun Yat-sen University; TCC-300 corpus; automatic Chinese accent detection; international Chinese-speaking students; linguistic complexity; multiaccent Mandarin corpus; pronunciation variation patterns; rudimentary accent classifier; Bismuth; accent detection; multi-accent Mandarin corpus;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on

Conference_Location

Hsinchu

Print_ISBN

978-1-4577-0930-2

Type

conf

DOI

10.1109/ICSDA.2011.6085997

Filename

6085997