Title :
Language Models and Smoothing Methods for Collections with Large Variation in Document Length
Author :
Abdulmutalib, Najeeb ; Fuhr, Norbert
Author_Institution :
Dept. of Comput. & Cognitive Sci., Univ. of Duisburg-Essen, Duisburg
Abstract :
In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.
Keywords :
document handling; information retrieval; natural language processing; smoothing methods; exponential smoothing; language models; large document length variation; large variation; smoothing methods; Databases; Expert systems; Frequency estimation; Information retrieval; Information systems; Smoothing methods; XML; Yield estimation; Information retrieval; Smoothing methods;
Conference_Titel :
Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
Conference_Location :
Turin
Print_ISBN :
978-0-7695-3299-8
DOI :
10.1109/DEXA.2008.33