DocumentCode :
2018301
Title :
An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling
Author :
Yeh, Ching-Feng ; Huang, Chao-yu ; Sun, Liang-Che ; Lee, Lin-shan
Author_Institution :
Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2010
fDate :
Nov. 29 2010-Dec. 3 2010
Firstpage :
214
Lastpage :
219
Abstract :
In this paper, we present an integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. The target corpus considered here has almost all utterances in the host language of Mandarin, while many of them are embedded with terms (mostly special terminologies for the course) produced in the guest language of English. For acoustic modeling, we propose a state mapping approach to merge English states with similar Mandarin states to solve the problem of very limited data for English, and integrate it with multi-path speaker adaptation. For language modeling, we integrate class-based n-grams based on perplexity or POS features, random forest and model adaptation. Very encouraging improvements in performance were obtained.
Keywords :
natural language processing; speech coding; Mandarin English code; Mandarin state; POS feature; acoustic modeling; class-based n-gram; language modeling; model adaptation; multipath speaker adaptation; perplexity; Accuracy; Acoustics; Adaptation model; Biological system modeling; Data models; Merging; Silicon; Class-based N-gram; MAP; MLLR; POS; RFLM; adaptation; bilingual; code-mixing; component; state-mapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
Type :
conf
DOI :
10.1109/ISCSLP.2010.5684908
Filename :
5684908
Link To Document :
بازگشت