Title :
Combination of words and word categories in varigram histories
Author :
Blasig, Reinhard
Author_Institution :
Philips Res. Lab., Aachen, Germany
Abstract :
This paper presents a new kind of language model: category/word varigrams. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categories may be employed to describe a given word history. This provides a much greater flexibility than previous combinations of word-based and category-based language models. Experiments on the WSJO corpus and the 1994 ARPA evaluation data indicate that the category/word varigram yields a perplexity reduction of up to 10 percent as compared to a word varigram of the same size, and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent
Keywords :
computational linguistics; natural languages; 1994 ARPA evaluation data; WER; WSJO corpus; category-based modeling; category/word varigrams; language model; perplexity reduction; varigram histories; word categories; word error rate; word history; word sequences; word-based modeling; words; Educational technology; Error analysis; History; Interpolation; Laboratories; Natural languages; Predictive models; Probability;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location :
Phoenix, AZ
Print_ISBN :
0-7803-5041-3
DOI :
10.1109/ICASSP.1999.758179