DocumentCode
2009349
Title
Language identification in code-switching speech using word-based lexical model
Author
Lyu, Dou-Cheng ; Zhu, Cing-lei ; Lyu, Ren-Yuan ; Ko, Ming-Tat
Author_Institution
Temasek Labs., Univ., Singapore, Singapore
fYear
2010
fDate
Nov. 29 2010-Dec. 3 2010
Firstpage
460
Lastpage
464
Abstract
In this paper, a language identification (LID) task is described on Mandarin/Taiwanese code-switching utterances. The proposed word-based lexical model of this LID system integrates acoustic, phonetic and lexical cues. The first two cues are obtained from a large vocabulary continuous speech recognition (LYCSR) system, and the last one is trained for a word-based lexical model. The lexical model is used to identify languages according to the frequency and context of each word by given a sequence of words recognized by the LVCSR system. Because the switching unit in the code-switching speech is a word, the experiments showed that, by using a word-based lexical model, 16% relative reduction of classification errors was achieved compared with that in those LVSCR-based LID systems.
Keywords
linguistics; speech recognition; switching; vocabulary; classification error; code switching speech; language identification; large vocabulary continuous speech recognition; switching unit; word based lexical model; Code-switching; Speech Recognition; Taiwanese Mandarin;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location
Tainan
Print_ISBN
978-1-4244-6244-5
Type
conf
DOI
10.1109/ISCSLP.2010.5684483
Filename
5684483
Link To Document