Title :
Query Representation through Lexical Association for Information Retrieval
Author :
Goyal, Pawan ; Behera, Laxmidhar ; McGinnity, Thomas Martin
Author_Institution :
Intell. Syst. Res. Centre, Univ. of Ulster, Londonderry, UK
Abstract :
A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25), and Language Model (LM). Experiments over various data sets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well-known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state of the art in global query expansion.
Keywords :
computational linguistics; query processing; relevance feedback; word processing; IR; LM; Okapi best matching-25; TREC; VSM; analytical derivation; baseline Okapi BM25; comprehensive theoretical analysis; global query expansion techniques; heuristics; information retrieval; language model; lexical association function; mathematical framework; parametric query vector; statistical analysis; system relevance criteria; term mismatch problem; user query representation; user relevance criteria; vector space model; word mismatch problem; Context awareness; Correlation; Indexes; Information retrieval; Markov processes; Mathematical model; Information retrieval; language model; lexical association; query expansion;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2011.171