DocumentCode :
2839306
Title :
Detection of language boundary in code-switching utterances by bi-phone probabilities
Author :
Chan, Joyce Y C ; Ching, P.C. ; Lee, Tan ; Meng, Helen M.
Author_Institution :
Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Shatin, China
fYear :
2004
fDate :
15-18 Dec. 2004
Firstpage :
293
Lastpage :
296
Abstract :
In this paper, we present an effective method to detect the language boundary (LB) in code-switching utterances. The utterances are mainly produced in Cantonese, a commonly used Chinese dialect, whilst occasionally English words are inserted between Cantonese words. Bi-phone probabilities are calculated to measure the confidence that the recognized phones are in Cantonese. Two sets of context-independent mono-phone models are trained by monolingual Cantonese and monolingual English data separately. Both knowledge-based and data-driven model selection approaches are studied in order to retain the language-dependent characteristics and to merge duplicated phone sets between the two languages. The LB detection accuracy is 75.12% for utterances that contain one single code-switching word or phrase.
Keywords :
probability; speech processing; speech recognition; Cantonese Chinese dialect; bi-phone probabilities; code-switching utterances; context-independent mono-phone models; data-driven model selection; duplicated phone set merging; knowledge-based model selection; language boundary detection; language-dependent characteristics; monolingual Cantonese data; monolingual English data; recognized phones; Acoustic measurements; Automatic speech recognition; Context modeling; Natural languages; Probability; Research and development management; Speech recognition; Switches; Systems engineering and theory; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
Type :
conf
DOI :
10.1109/CHINSL.2004.1409644
Filename :
1409644
Link To Document :
بازگشت