Title :
Long-term relevance feedback using simple PCA and linear transformation
Author :
Tai, Xiaoying ; Ren, Fuji ; Kita, Kenji
Author_Institution :
Fac. of Eng., Tokushima Univ., Japan
Abstract :
This paper proposes a new method to improve information retrieval performance of the vector space model (VSM) in part by preserving user-supplied relevance information in the long term in the system. The proposed method incorporates user relevance feedback information and original document similarity information into the retrieval model that is built using a sequence of linear transformations. High-dimensional and sparse vectors are mapped into the a low-dimensional vector space, namely the space representing the latent semantic meanings of words, by using SPCA (simple principal component analysis). An experimental information retrieval system based on the proposed method has been built. Experiments on the Medline collection and Cranfield collection have been carried out. Improved average precision compared with the LSI (latent semantic indexing) model, are 6.80% (Medline) and 67.46% (Cranfield) for the two training data sets, and 4.71% (Medline) and 8.12% (Cranfield) for the test data, respectively. The results of our experiments show that the proposed method has better retrieval performance and provides an approach that makes it possible to preserve user-supplied relevance information in the long term in the system in order to use it later.
Keywords :
indexing; principal component analysis; relevance feedback; Cranfield collection; Medline collection; average precision; document similarity information; high-dimensional vectors; information retrieval performance; latent semantic indexing model; latent semantic meanings; linear transformation; long-term relevance feedback; low-dimensional vector space; simple principal component analysis; sparse vectors; training data sets; user-supplied relevance information; vector space model; Feedback; Functional analysis; Indexing; Information retrieval; Large scale integration; Multidimensional systems; Principal component analysis; Testing; Training data; Vectors;
Conference_Titel :
Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on
Print_ISBN :
0-7695-1668-8
DOI :
10.1109/DEXA.2002.1045909