DocumentCode :
1966248
Title :
Query smearing: Improving classification accuracy and coverage of search results using logs
Author :
Oztekin, B. Uygar ; Chiu, Andy
Author_Institution :
Google Inc., Mountain View, CA, USA
fYear :
2009
fDate :
14-16 Sept. 2009
Firstpage :
135
Lastpage :
140
Abstract :
High dimensional concept spaces have various applications in Web search including personalized search, related page computation, diversity preservation, user interest inference, similarity computation, and advertisement targetting. Clustering and classification methods are common means to map documents and users into concept spaces. In most classification algorithms, precision (accuracy) and recall (coverage) tend to be competing aspects. In this paper, we introduce query smearing, an algorithm that can significantly improve both the accuracy and coverage of an existing classifier by leveraging information contained in fully anonymized search engine logs. Starting with a potentially incomplete seed classification, it expands the classification information to cover various items in search engine logs using a weighted majority voting scheme. The technique is similar to semi-supervised learning approaches and may be classified as one, but we have notable differences from most such examples. In particular, initial labels are not fully trusted for accuracy or completeness (hence, after the first stage, they can be thrown away), and additional relationships between classified items are used extensively to guide the process. Empirical evaluation shows that our algorithm performs well under the following assumptions: (i) the search engine log contains a sufficiently large number of query transactions, (ii) most results of most queries are relevant and on-topic, and (iii) sufficient fraction of search results are classified in the seed classification, and those classifications are reasonably accurate (but not necessarily complete).
Keywords :
pattern classification; query processing; search engines; Web search; advertisement targetting; classification accuracy; clustering method; diversity preservation; high-dimensional concept spaces; personalized search; query smearing; related page computation; search engine logs; search results coverage; semi-supervised learning approach; similarity computation; user interest inference; Classification algorithms; Clustering algorithms; Inference algorithms; Search engines; Space technology; Text categorization; Uniform resource locators; Videos; Voting; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on
Conference_Location :
Guzelyurt
Print_ISBN :
978-1-4244-5021-3
Electronic_ISBN :
978-1-4244-5023-7
Type :
conf
DOI :
10.1109/ISCIS.2009.5291837
Filename :
5291837
Link To Document :
بازگشت