Title of article :
Investigation of Luhn’s claim on information retrieval
Author/Authors :
KOCABAS, Ilker Ege University - International Computer Institute, TURKEY , DINCER, Bekir Taner Mugla University - Department of Statistics, TURKEY , KARAOGLAN, Bahar Ege University - International Computer Institute, TURKEY
Abstract :
In this study, we show how Luhn’s claim about the degree of importance of a word in a document can be related to information retrieval. His basic idea is transformed into z -scores as the weights of terms for the purpose of modeling term frequency (tf) within documents. The Luhn-based models represented in this paper are considered as the TF component of proposed TF × IDF weighing schemes. Moreover, the final term weighting functions appropriate for the TF × IDF weighting scheme are applied to TREC-6, -7, and -8 databases. The experimental results show relevance to Luhn’s claim by having high mean average precision (MAP) for the terms with frequencies around the mean frequency of terms within a document. On the other hand, the weighting, which significantly discriminates the importance between low/high frequencies and medium frequencies, degrades the retrieval performance. Therefore, any weighting scheme (TF) that is directly proportional to tf has a probability of high retrieval performance, if this can optimally indicate the difference of the importance regarding tf values and also optimally eliminate the terms that have high frequencies.
Keywords :
Luhn , information retrieval , term weighting , indexing
Journal title :
Turkish Journal of Electrical Engineering and Computer Sciences
Journal title :
Turkish Journal of Electrical Engineering and Computer Sciences