Title :
A maximum entropy approach for integrating semantic information in statistical language models
Author :
Chueh, Chuang-Hua ; Chien, Jen-Tzung ; Wang, Hsin-Min
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Cheng Kung Univ., Tainan, Taiwan
Abstract :
In this paper, we propose an adaptive statistical language model, which successfully incorporates the semantic information into an n-gram model. Traditional n-gram models exploit only the immediate context of history. We first introduce the semantic topic as a new source to extract the long distance information for language modeling, and then adopt the maximum entropy (ME) approach instead of the conventional linear interpolation method to integrate the semantic information with the n-gram model. Using the ME approach, each information source gives rise to a set of constraints, which should be satisfied to achieve the hybrid model. In the experiments, the ME language models, trained using the China Times newswire corpus, achieved 40% perplexity reduction over the baseline bigram model.
Keywords :
linguistics; maximum entropy methods; natural languages; ME language models; adaptive statistical language model; information source constraints; long distance information extraction; maximum entropy method; n-gram model; natural language regularities; perplexity reduction; semantic information; Automatic speech recognition; Computer science; Context modeling; Data mining; Electronic mail; Entropy; History; Information science; Interpolation; Natural languages;
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
DOI :
10.1109/CHINSL.2004.1409648