Title :
Joint-character-POC N-gram language modeling for Chinese speech recognition
Author :
Bin Wang ; Zhijian Ou ; Jian Li ; Kawamura, Atsuo
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Abstract :
The state-of-the-art language models (LMs) for Chinese speech recognition are word n-gram models. However, in Chinese, characters are morphological in meaning and words are not consistently defined. There are recent interests in building the character n-gram LM and its combination with the word n-gram LM. In this paper, in order to exploit both character-level and word-level constraints, we propose the joint n-gram LM, which is an n-gram model based on joint-state that is a pair of character and its position-of-character (POC) tag. We point out the pitfall in naive solving of the smoothing and scoring problems for joint n-gram models, and provide corrected solutions. For experimental comparison, different LMs (including word 4-grams, character 6-grams and joint 6-grams) are tested for speech recognition, using training corpus of 1.9 billion characters. The joint n-gram LM achieves performance improvements, especially in recognizing the utterances containing OOV words.
Keywords :
natural language processing; smoothing methods; speech recognition; Chinese speech recognition; OOV words; POC tag; character-level constraints; joint n-gram LM; joint n-gram models; joint-character-POC N-gram language modeling; out-of-vocabulary words; performance improvements; position-of-character tag; scoring problem; smoothing problem; utterance recognition; word n-gram LM; word n-gram models; word-level constraints; Computational modeling; Joints; Smoothing methods; Speech; Speech recognition; Standards; Training; Chinese Speech Recognition; Joint n-gram; Language Model;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936588