Title of article :
Improved retrieval effectiveness by efficient combination of term proximity and zone scoring: A simulation-based evaluation
Author/Authors :
Akritidis، نويسنده , , Leonidas and Katsaros، نويسنده , , Dimitrios and Bozanis، نويسنده , , Panayiotis، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Pages :
18
From page :
74
To page :
91
Abstract :
During the past few years, the commercial Web search engines have augmented their underlying index structures by significantly enriching the information which describes the appearance of a word within a document Dean (2009) [7]. This enriched information is now used in complex and effective functions which rank documents by taking into consideration hundreds of features, with respect to a user query. Despite the evolution of the search engines, the past research has mainly concentrated on improving plain Web indexes storing typical data only. In this work we study the problem of organizing an inverted index storing additional information. In particular, we examine how the physical locations of a document, called zones, can be efficiently integrated with such an index structure. We introduce TZP, an encoder which compresses these zones in combination to the positions of a word in a document, by employing a fixed number of bits for each portion of a word’s inverted list. We demonstrate that our method allows direct access to the compressed zones and positions without expensive look-ups, avoids decoding any unnecessary information, while its overall index size is analogous or even better when compared against state-of-the art schemes. Moreover, we examine how the word positions can be combined to the zones to improve retrieval effectiveness. We introduce BM25TOPF, a scheme which incorporates term proximity and zone weighting into a single ranking formula. Unlike other term proximity approaches, BM25TOPF also takes into account the ordering of the query terms by rewarding the documents containing them in the correct order. Our experiments with the Web Adhoc Task of TREC 2009 and a set of own queries show that BM25TOPF outperforms the current state-of-the-art approaches by a margin between 6% and 11%.
Keywords :
WEB , Search Engines , Inverted index , SIMULATION , Evaluation
Journal title :
Simulation Modelling Practice and Theory
Serial Year :
2012
Journal title :
Simulation Modelling Practice and Theory
Record number :
1582381
Link To Document :
بازگشت