DocumentCode
2533098
Title
Clustering of Web Search Results Based on Combination of Links and In-Snippets
Author
Yang, Nan ; Liu, Yue ; Yang, Gang
Author_Institution
Sch. of Inf., Renmin Univ. of China, Beijing, China
fYear
2011
fDate
21-23 Oct. 2011
Firstpage
108
Lastpage
113
Abstract
Search engine is a common tool to retrieve the information in the Web. But the current status of returned results is still far from satisfaction. Users have to be confronted with searching for a long result list to get the information really wanted. Many works focused on the post processing search results to facilitate users to examine the results. One of the common ways of post processing search result is clustering. Term-based clustering appears as first way to cluster the results. But this method is suffering from the poor quality while the processed pages have little text. Link-based clustering can conquer this problem. But the quality of clusters heavily depends on the number of in-links and out-links in common. In this paper, we propose that the short text attached to in-link is valuable information and it is helpful to reach high clustering quality. To distinguish them with general snippet, we name it as in-snippet. Based on the in-snippet, we propose a new clustering method that combines the links and the in-snippets together. In our method, similarity between pages consists of two parts : link similarity and term similarity. We designed related algorithm to implement clustering. In order to prevent bias from human judgments, the experiment datasets are collected from Open Directory Project(DMOZ). Due to DMOZ is human-edited directory, the datasets from DMOZ has higher quality and larger scale. We use entropy and f-measure to evaluate the quality of the final clusters. By being compared with the link-based and the pure term-based algorithms, our method outperforms others in clustering quality.
Keywords
Internet; information retrieval; pattern clustering; search engines; DMOZ; Web search; entropy; f-measure; in-snippet; link similarity; link-based clustering; search engine; term similarity; term-based clustering; Algorithm design and analysis; Clustering algorithms; Educational institutions; Entropy; Search engines; Vectors; Web pages; Clustering; Link analysis; Search engine result;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Systems and Applications Conference (WISA), 2011 Eighth
Conference_Location
Chongqing
Print_ISBN
978-1-4577-1812-0
Type
conf
DOI
10.1109/WISA.2011.28
Filename
6093575
Link To Document