• DocumentCode
    2839306
  • Title

    Detection of language boundary in code-switching utterances by bi-phone probabilities

  • Author

    Chan, Joyce Y C ; Ching, P.C. ; Lee, Tan ; Meng, Helen M.

  • Author_Institution
    Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Shatin, China
  • fYear
    2004
  • fDate
    15-18 Dec. 2004
  • Firstpage
    293
  • Lastpage
    296
  • Abstract
    In this paper, we present an effective method to detect the language boundary (LB) in code-switching utterances. The utterances are mainly produced in Cantonese, a commonly used Chinese dialect, whilst occasionally English words are inserted between Cantonese words. Bi-phone probabilities are calculated to measure the confidence that the recognized phones are in Cantonese. Two sets of context-independent mono-phone models are trained by monolingual Cantonese and monolingual English data separately. Both knowledge-based and data-driven model selection approaches are studied in order to retain the language-dependent characteristics and to merge duplicated phone sets between the two languages. The LB detection accuracy is 75.12% for utterances that contain one single code-switching word or phrase.
  • Keywords
    probability; speech processing; speech recognition; Cantonese Chinese dialect; bi-phone probabilities; code-switching utterances; context-independent mono-phone models; data-driven model selection; duplicated phone set merging; knowledge-based model selection; language boundary detection; language-dependent characteristics; monolingual Cantonese data; monolingual English data; recognized phones; Acoustic measurements; Automatic speech recognition; Context modeling; Natural languages; Probability; Research and development management; Speech recognition; Switches; Systems engineering and theory; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing, 2004 International Symposium on
  • Print_ISBN
    0-7803-8678-7
  • Type

    conf

  • DOI
    10.1109/CHINSL.2004.1409644
  • Filename
    1409644