• DocumentCode
    2830359
  • Title

    Language Models and Smoothing Methods for Collections with Large Variation in Document Length

  • Author

    Abdulmutalib, Najeeb ; Fuhr, Norbert

  • Author_Institution
    Dept. of Comput. & Cognitive Sci., Univ. of Duisburg-Essen, Duisburg
  • fYear
    2008
  • fDate
    1-5 Sept. 2008
  • Firstpage
    9
  • Lastpage
    14
  • Abstract
    In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.
  • Keywords
    document handling; information retrieval; natural language processing; smoothing methods; exponential smoothing; language models; large document length variation; large variation; smoothing methods; Databases; Expert systems; Frequency estimation; Information retrieval; Information systems; Smoothing methods; XML; Yield estimation; Information retrieval; Smoothing methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Application, 2008. DEXA '08. 19th International Workshop on
  • Conference_Location
    Turin
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-3299-8
  • Type

    conf

  • DOI
    10.1109/DEXA.2008.33
  • Filename
    4624684