DocumentCode :
3576342
Title :
Investigating sample selection bias in the relevance feedback algorithm of the vector space model for Information Retrieval
Author :
Melucci, Massimo
Author_Institution :
Dept. of Inf. Eng., Univ. of Padua, Padua, Italy
fYear :
2014
Firstpage :
83
Lastpage :
89
Abstract :
Information Retrieval (IR) is concerned with indexing and retrieving documents including information relevant to a user´s information need. Relevance Feedback (RF) is an effective technique for improving IR and it consists of gathering further data representing the user´s information need and automatically creating a new query. As RF relies on the ability of an IR system to learn new queries and is mostly based on statistical methods, a parallel between RF and statistical Machine Learning (ML) can be drawn. However, the effectiveness of RF is due to the biased selection of the sample data, thus contradicting the requirement that effective statistical learning is based on unbiased sample data. This paper studies this contradiction and suggests that RF cannot be straightforwardly studied within statistical ML without considering the intrinsic nature of the data managed by an IR system and of the user´s information need. In particular, the paper reports that an RF algorithm is mostly influenced by the distance between the informative content of the training set and the informative content of the test set and is not influenced by sample selection bias.
Keywords :
information retrieval; learning (artificial intelligence); statistical analysis; IR system; ML; RF algorithm; information retrieval; informative content; relevance feedback algorithm; sample selection bias; statistical machine learning; training set; vector space model; Algorithm design and analysis; Convergence; Machine learning algorithms; Probability distribution; Radio frequency; Training; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2014 International Conference on
Type :
conf
DOI :
10.1109/DSAA.2014.7058056
Filename :
7058056
Link To Document :
بازگشت