DocumentCode
2830359
Title
Language Models and Smoothing Methods for Collections with Large Variation in Document Length
Author
Abdulmutalib, Najeeb ; Fuhr, Norbert
Author_Institution
Dept. of Comput. & Cognitive Sci., Univ. of Duisburg-Essen, Duisburg
fYear
2008
fDate
1-5 Sept. 2008
Firstpage
9
Lastpage
14
Abstract
In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.
Keywords
document handling; information retrieval; natural language processing; smoothing methods; exponential smoothing; language models; large document length variation; large variation; smoothing methods; Databases; Expert systems; Frequency estimation; Information retrieval; Information systems; Smoothing methods; XML; Yield estimation; Information retrieval; Smoothing methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
Conference_Location
Turin
ISSN
1529-4188
Print_ISBN
978-0-7695-3299-8
Type
conf
DOI
10.1109/DEXA.2008.33
Filename
4624684
Link To Document