DocumentCode :
312002
Title :
Combination of word-based and category-based language models
Author :
Niesler, T.R. ; Woodland, P.C.
Author_Institution :
Dept. of Eng., Cambridge Univ., UK
Volume :
1
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
220
Abstract :
A language model combining word based and category based n grams within a backoff framework is presented. Word n grams conveniently capture sequential relations between particular words, while the category model, which is based on part of speech classifications and allows ambiguous category membership, is able to generalise to unseen word sequences and therefore appropriate in backoff situations. Experiments on the LOB, Switchboard and WSJO corpora demonstrate that the technique greatly improves language model perplexities for sparse training sets, and offers significantly improved complexity versus performance tradeoffs when compared with standard trigram models
Keywords :
computational linguistics; linguistics; word processing; LOB; Switchboard; WSJO corpora; ambiguous category membership; backoff framework; category based language models; category based n grams; language model perplexities; part of speech classifications; sequential relations; sparse training sets; trigram models; unseen word sequences; word based language models; Computational complexity; Context modeling; Equations; History; Marine vehicles; Runtime; Speech recognition; Tires;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607081
Filename :
607081
Link To Document :
بازگشت