Title :
A New Entropy-based Vocabulary Optimization Approach for Chinese Language Modeling
Author :
Wang, XiaoRui ; Ding, Peng ; Liang, JiaEn ; Xu, Bo
Author_Institution :
Chinese Acad. of Sci., Beijing
fDate :
Aug. 30 2007-Sept. 1 2007
Abstract :
This paper proposed a new entropy-based vocabulary optimization approach for Chinese language modeling. This approach aims to directly optimize the language model by extending the vocabulary, that is, to minimize the character perplexity of the language model. A new criterion for new words selection was developed based on the character perplexity metric. A fast computing method and a simple divide-and-conquer method were proposed to deal with very large corpus. Experiments showed about 3% character perplexity reduction and 3% character error rate reduction in a speech recognition task. Comparison experiments were also conducted to compare with other approaches.
Keywords :
divide and conquer methods; natural language processing; optimisation; speech recognition; vocabulary; Chinese language modeling; character error rate reduction; character perplexity metric; character perplexity reduction; divide-and-conquer method; entropy-based vocabulary optimization approach; speech recognition; Automation; Context modeling; Error analysis; Frequency; Iterative methods; Natural languages; Pattern recognition; Speech recognition; Stability; Vocabulary;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
DOI :
10.1109/NLPKE.2007.4368083