DocumentCode
2183466
Title
Improving Web clustering by cluster selection
Author
Crabtree, Daniel ; Gao, Xiaoying ; Andreae, Peter
Author_Institution
Sch. of Math., Stat. & Comput. Sci., Victoria Univ. of Wellington, New Zealand
fYear
2005
fDate
19-22 Sept. 2005
Firstpage
172
Lastpage
178
Abstract
Web page clustering is a technology that puts semantically related Web pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only textual information, suffix tree clustering (STC) outperforms other clustering algorithms by making use of phrases and allowing clusters to overlap. One problem of STC and other similar algorithms is how to select a small set of clusters to display to the user from a very large set of generated clusters. The cluster selection method used in STC is flawed in that it does not handle overlapping clusters appropriately. This paper introduces a new cluster scoring function and a new cluster selection algorithm to overcome the problems with overlapping clusters, which are combined with STC to make a new clustering algorithm ESTC. This paper´s experiments show that ESTC significantly outperforms STC and that even with less data ESTC performs similarly to a commercial clustering search engine.
Keywords
Web sites; classification; pattern clustering; search engines; semantic Web; Web page clustering; cluster scoring function; cluster selection; search engine; suffix tree clustering; textual information; Clustering algorithms; Computer science; Displays; Filters; Internet; Mathematics; Organizing; Search engines; Statistics; Web pages; cluster selection; web clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2415-X
Type
conf
DOI
10.1109/WI.2005.75
Filename
1517839
Link To Document