DocumentCode :
2533098
Title :
Clustering of Web Search Results Based on Combination of Links and In-Snippets
Author :
Yang, Nan ; Liu, Yue ; Yang, Gang
Author_Institution :
Sch. of Inf., Renmin Univ. of China, Beijing, China
fYear :
2011
fDate :
21-23 Oct. 2011
Firstpage :
108
Lastpage :
113
Abstract :
Search engine is a common tool to retrieve the information in the Web. But the current status of returned results is still far from satisfaction. Users have to be confronted with searching for a long result list to get the information really wanted. Many works focused on the post processing search results to facilitate users to examine the results. One of the common ways of post processing search result is clustering. Term-based clustering appears as first way to cluster the results. But this method is suffering from the poor quality while the processed pages have little text. Link-based clustering can conquer this problem. But the quality of clusters heavily depends on the number of in-links and out-links in common. In this paper, we propose that the short text attached to in-link is valuable information and it is helpful to reach high clustering quality. To distinguish them with general snippet, we name it as in-snippet. Based on the in-snippet, we propose a new clustering method that combines the links and the in-snippets together. In our method, similarity between pages consists of two parts : link similarity and term similarity. We designed related algorithm to implement clustering. In order to prevent bias from human judgments, the experiment datasets are collected from Open Directory Project(DMOZ). Due to DMOZ is human-edited directory, the datasets from DMOZ has higher quality and larger scale. We use entropy and f-measure to evaluate the quality of the final clusters. By being compared with the link-based and the pure term-based algorithms, our method outperforms others in clustering quality.
Keywords :
Internet; information retrieval; pattern clustering; search engines; DMOZ; Web search; entropy; f-measure; in-snippet; link similarity; link-based clustering; search engine; term similarity; term-based clustering; Algorithm design and analysis; Clustering algorithms; Educational institutions; Entropy; Search engines; Vectors; Web pages; Clustering; Link analysis; Search engine result;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems and Applications Conference (WISA), 2011 Eighth
Conference_Location :
Chongqing
Print_ISBN :
978-1-4577-1812-0
Type :
conf
DOI :
10.1109/WISA.2011.28
Filename :
6093575
Link To Document :
بازگشت