DocumentCode :
3228441
Title :
Query Directed Web Page Clustering
Author :
Crabtree, Daniel ; Andreae, Peter ; Gao, Xiaoying
Author_Institution :
Sch. of Math., Stat. & Comput. Sci., Victoria Univ. of Wellington
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
202
Lastpage :
210
Abstract :
Web page clustering methods categorize and organize search results into semantically meaningful clusters that assist users with search refinement; but finding clusters that are semantically meaningful to users is difficult. In this paper, we describe a new Web page clustering algorithm, QDC, which uses the user´s query as part of a reliable measure of cluster quality. The new algorithm has five key innovations: a new query directed cluster quality guide that uses the relationship between clusters and the query, an improved cluster merging method that generates semantically coherent clusters by using cluster description similarity in additional to cluster overlap, a new cluster splitting method that fixes the cluster chaining or cluster drifting problem, an improved heuristic for cluster selection that uses the query directed cluster quality guide, and a new method of improving clusters by ranking the pages by relevance to the cluster. We evaluate QDC by comparing its clustering performance against that of four other algorithms on eight data sets (four use full text data and four use snippet data) by using eleven different external evaluation measurements. We also evaluate QDC by informally analysing its real world usability and performance through comparison with six other algorithms on four data sets. QDC provides a substantial performance improvement over other Web page clustering algorithms
Keywords :
Internet; query processing; text analysis; cluster drifting problem; cluster splitting method; query directed Web page clustering; search refinement; Clustering algorithms; Clustering methods; Computer science; Mathematics; Merging; Partitioning algorithms; Statistics; Technological innovation; Usability; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7
Type :
conf
DOI :
10.1109/WI.2006.142
Filename :
4061367
Link To Document :
بازگشت