• DocumentCode
    2018301
  • Title

    An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling

  • Author

    Yeh, Ching-Feng ; Huang, Chao-yu ; Sun, Liang-Che ; Lee, Lin-shan

  • Author_Institution
    Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    214
  • Lastpage
    219
  • Abstract
    In this paper, we present an integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. The target corpus considered here has almost all utterances in the host language of Mandarin, while many of them are embedded with terms (mostly special terminologies for the course) produced in the guest language of English. For acoustic modeling, we propose a state mapping approach to merge English states with similar Mandarin states to solve the problem of very limited data for English, and integrate it with multi-path speaker adaptation. For language modeling, we integrate class-based n-grams based on perplexity or POS features, random forest and model adaptation. Very encouraging improvements in performance were obtained.
  • Keywords
    natural language processing; speech coding; Mandarin English code; Mandarin state; POS feature; acoustic modeling; class-based n-gram; language modeling; model adaptation; multipath speaker adaptation; perplexity; Accuracy; Acoustics; Adaptation model; Biological system modeling; Data models; Merging; Silicon; Class-based N-gram; MAP; MLLR; POS; RFLM; adaptation; bilingual; code-mixing; component; state-mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684908
  • Filename
    5684908