Language Models and Smoothing Methods for Collections with Large Variation in Document Length

Author

Abdulmutalib, Najeeb ; Fuhr, Norbert

Author_Institution

Dept. of Comput. & Cognitive Sci., Univ. of Duisburg-Essen, Duisburg

fYear

2008

fDate

1-5 Sept. 2008

Firstpage

Lastpage

Abstract

In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.

Keywords

document handling; information retrieval; natural language processing; smoothing methods; exponential smoothing; language models; large document length variation; large variation; smoothing methods; Databases; Expert systems; Frequency estimation; Information retrieval; Information systems; Smoothing methods; XML; Yield estimation; Information retrieval; Smoothing methods;

fLanguage

English

Publisher

ieee

Conference_Titel

Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on

Conference_Location

Turin

ISSN

1529-4188

Print_ISBN

978-0-7695-3299-8

Type

conf

DOI

10.1109/DEXA.2008.33

Filename

4624684

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2830359