Title :
An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling
Author :
Yeh, Ching-Feng ; Huang, Chao-yu ; Sun, Liang-Che ; Lee, Lin-shan
Author_Institution :
Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
In this paper, we present an integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. The target corpus considered here has almost all utterances in the host language of Mandarin, while many of them are embedded with terms (mostly special terminologies for the course) produced in the guest language of English. For acoustic modeling, we propose a state mapping approach to merge English states with similar Mandarin states to solve the problem of very limited data for English, and integrate it with multi-path speaker adaptation. For language modeling, we integrate class-based n-grams based on perplexity or POS features, random forest and model adaptation. Very encouraging improvements in performance were obtained.
Keywords :
natural language processing; speech coding; Mandarin English code; Mandarin state; POS feature; acoustic modeling; class-based n-gram; language modeling; model adaptation; multipath speaker adaptation; perplexity; Accuracy; Acoustics; Adaptation model; Biological system modeling; Data models; Merging; Silicon; Class-based N-gram; MAP; MLLR; POS; RFLM; adaptation; bilingual; code-mixing; component; state-mapping;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684908