DocumentCode
2018301
Title
An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling
Author
Yeh, Ching-Feng ; Huang, Chao-yu ; Sun, Liang-Che ; Lee, Lin-shan
Author_Institution
Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear
2010
fDate
Nov. 29 2010-Dec. 3 2010
Firstpage
214
Lastpage
219
Abstract
In this paper, we present an integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. The target corpus considered here has almost all utterances in the host language of Mandarin, while many of them are embedded with terms (mostly special terminologies for the course) produced in the guest language of English. For acoustic modeling, we propose a state mapping approach to merge English states with similar Mandarin states to solve the problem of very limited data for English, and integrate it with multi-path speaker adaptation. For language modeling, we integrate class-based n-grams based on perplexity or POS features, random forest and model adaptation. Very encouraging improvements in performance were obtained.
Keywords
natural language processing; speech coding; Mandarin English code; Mandarin state; POS feature; acoustic modeling; class-based n-gram; language modeling; model adaptation; multipath speaker adaptation; perplexity; Accuracy; Acoustics; Adaptation model; Biological system modeling; Data models; Merging; Silicon; Class-based N-gram; MAP; MLLR; POS; RFLM; adaptation; bilingual; code-mixing; component; state-mapping;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location
Tainan
Print_ISBN
978-1-4244-6244-5
Type
conf
DOI
10.1109/ISCSLP.2010.5684908
Filename
5684908
Link To Document