• DocumentCode
    2009349
  • Title

    Language identification in code-switching speech using word-based lexical model

  • Author

    Lyu, Dou-Cheng ; Zhu, Cing-lei ; Lyu, Ren-Yuan ; Ko, Ming-Tat

  • Author_Institution
    Temasek Labs., Univ., Singapore, Singapore
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    460
  • Lastpage
    464
  • Abstract
    In this paper, a language identification (LID) task is described on Mandarin/Taiwanese code-switching utterances. The proposed word-based lexical model of this LID system integrates acoustic, phonetic and lexical cues. The first two cues are obtained from a large vocabulary continuous speech recognition (LYCSR) system, and the last one is trained for a word-based lexical model. The lexical model is used to identify languages according to the frequency and context of each word by given a sequence of words recognized by the LVCSR system. Because the switching unit in the code-switching speech is a word, the experiments showed that, by using a word-based lexical model, 16% relative reduction of classification errors was achieved compared with that in those LVSCR-based LID systems.
  • Keywords
    linguistics; speech recognition; switching; vocabulary; classification error; code switching speech; language identification; large vocabulary continuous speech recognition; switching unit; word based lexical model; Code-switching; Speech Recognition; Taiwanese Mandarin;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684483
  • Filename
    5684483