DocumentCode :
1910921
Title :
A New Entropy-based Vocabulary Optimization Approach for Chinese Language Modeling
Author :
Wang, XiaoRui ; Ding, Peng ; Liang, JiaEn ; Xu, Bo
Author_Institution :
Chinese Acad. of Sci., Beijing
fYear :
2007
fDate :
Aug. 30 2007-Sept. 1 2007
Firstpage :
242
Lastpage :
247
Abstract :
This paper proposed a new entropy-based vocabulary optimization approach for Chinese language modeling. This approach aims to directly optimize the language model by extending the vocabulary, that is, to minimize the character perplexity of the language model. A new criterion for new words selection was developed based on the character perplexity metric. A fast computing method and a simple divide-and-conquer method were proposed to deal with very large corpus. Experiments showed about 3% character perplexity reduction and 3% character error rate reduction in a speech recognition task. Comparison experiments were also conducted to compare with other approaches.
Keywords :
divide and conquer methods; natural language processing; optimisation; speech recognition; vocabulary; Chinese language modeling; character error rate reduction; character perplexity metric; character perplexity reduction; divide-and-conquer method; entropy-based vocabulary optimization approach; speech recognition; Automation; Context modeling; Error analysis; Frequency; Iterative methods; Natural languages; Pattern recognition; Speech recognition; Stability; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1611-0
Electronic_ISBN :
978-1-4244-1611-0
Type :
conf
DOI :
10.1109/NLPKE.2007.4368083
Filename :
4368083
Link To Document :
بازگشت