Title :
Semantics-based language modeling for Cantonese-English code-mixing speech recognition
Author :
Cao, Houwei ; Ching, P.C. ; Lee, Tan ; Yeung, Yu Ting
Author_Institution :
Dept. of Electron. Eng., Chinese Univ. of Hong Hong, Hong Kong, China
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
This paper addresses the problem of language modeling for LVCSR of Cantonese-English code-mixing utterances spoken in daily communications. In the absence of sufficient amount of code-mixing text data, translation-based and semantics-based mapping are applied on n-grams to better estimate the probability of low-frequency and unseen mixed-language n-grams events. In translation-based mapping scheme, the Cantonese-to-English translation dictionary is adopted to transcribe monolingual Cantonese n-grams to mixed-language n-grams. In semantics-based mapping scheme, n-gram mapping is based on the meaning and syntactic function of the English words in the lexicon. Different semantics-based language models are trained with different mapping schemes. They are evaluated in terms of perplexity and in the task of LVCSR. Experimental results confirm that, the more the observed mixed-language n-grams after mapping, the better the language model perplexity as well as the recognition performance. The proposed language models show significant improvement on recognition performance on embedded English words when they are compared with the baseline 3-gram LM. The best recognition accuracy attained is 63.9% and 74.7% respectively for the English words and Cantonese characters in code-mixing utterances.
Keywords :
speech recognition; Cantonese English code mixing speech recognition; Cantonese characters; LVCSR; code mixing text data; code mixing utterances; english characters; semantics based language modeling; semantics based mapping; translation based mapping; Accuracy; Data models; Dictionaries; Hidden Markov models; Speech; Speech recognition; Training; ASR; Cantonese-English code-mixing; language modeling; semantics;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684900