Title :
Search scripts mining from wisdom of the crowds
Author :
Wang, Chieh-Jen ; Chen, Hsin-Hsi
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
Abstract :
This paper mines sequences of actions called search scripts from query logs which keep large scale users´ search experiences. Search scripts can be applied to predict users´ search needs, improve the retrieval effectiveness, recommend advertisements, and so on. Information quality, topic diversity, query ambiguity, and URL relevancy are major challenging issues in search scripts mining. In this paper, we calculate the relevance of URLs, adopt the Open Directory Project (ODP) categories to disambiguate queries and URLs, explore various features and clustering algorithms for intent clustering, and identify critical actions from each intent cluster to form a search script. Experiments show that the model based on a complete link hierarchical clustering algorithm with the features of query terms, relevant URLs, and disambiguated ODP categories performs the best. Search scripts are generated from the best model. When only search scripts containing a single intent are considered to be correct, the accuracy of the action identification algorithm is 0.4650. If search scripts containing a major intent are also counted, the accuracy increases to 0.7315.
Keywords :
data mining; pattern clustering; query processing; URL relevancy; action identification algorithm; action sequence mining; advertisement recommendation; complete link hierarchical clustering algorithm; crowd wisdom; information quality; open directory project categories; query ambiguity; query logs; retrieval effectiveness improvement; search script mining; topic diversity; user search need prediction; Accuracy; Clustering algorithms; Noise; Predictive models; Search engines; Sports equipment; Web pages; mining web logs; search script; web search enhancement;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4577-0652-3
DOI :
10.1109/ICSMC.2011.6083762