DocumentCode :
3539194
Title :
Query classification using asymmetric learning
Author :
Zhu, Zheng ; Levene, Mark ; Cox, Ingemar J.
Author_Institution :
Birkbeck Coll., Univ. of London, London, UK
fYear :
2009
fDate :
4-6 Aug. 2009
Firstpage :
518
Lastpage :
524
Abstract :
Understanding the meaning of queries is a key task queries is a challenging task due to the fact that queries are usually short and often ambiguous. A common approach to tackle the problem of short and noisy queries is to enrich the queries. Various enrichment strategies have been proposed that are based on either pseudo-relevance feedback or secondary sources of information. In general, pseudo-relevance feedback based algorithms exhibit superior performance. However, in this case query classification can only occur after performing the retrieval, as the result set is needed to apply pseudo-relevance feedback. Since some applications may prefer to perform query classification prior to, or in parallel with retrieval, there is a need to improve the performance of query classification based on secondary sources. In this paper, we present a hybrid strategy, in which training is based on pseudo-relevance feedback, but testing is based on a secondary source, specifically Yahoo´s ldquosuggested keywordsrdquo. These keywords are based on co-occurrence data across queries. The classifier, which is built offline with training data, makes use of the top-n results during training, but not duing testing. Thus, there is an asymmetry between the training and testing data. We compared the classification using symmetrical and asymmetrical approaches on a large AOL search log. Symmetric training and testing using queries enriched with Yahoo keywords yielded a microaveraged F1 score of 44%. Asymmetric training (enriching with the top-10 Google snippets) and testing (enriching with Yahoo suggested keywords) increased the F1 score to 46%. This is comparable with a symmetric approach based on feedback of the top-2 pseudo-relevant documents, in which a similar number of enrichment terms is added.
Keywords :
learning (artificial intelligence); pattern classification; query processing; relevance feedback; asymmetric learning; pseudo relevance feedback; query classification; query enrichment; query retrieval; training data; Advertising; Computer science; Educational institutions; Feedback; Heart; Information resources; Information systems; Metasearch; Testing; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location :
London
Print_ISBN :
978-1-4244-4456-4
Electronic_ISBN :
978-1-4244-4457-1
Type :
conf
DOI :
10.1109/ICADIWT.2009.5273856
Filename :
5273856
Link To Document :
بازگشت