DocumentCode
3007099
Title
Towards automatic clustering of similar pages in web applications
Author
De Lucia, Andrea ; Risi, Michele ; Tortora, Genoveffa ; Scanniello, Giuseppe
Author_Institution
Dipt. di Mat. e Inf., Univ. of Salerno, Fisciano, Italy
fYear
2009
fDate
25-26 Sept. 2009
Firstpage
99
Lastpage
108
Abstract
In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
Keywords
Web sites; content-based retrieval; graph theory; indexing; pattern clustering; string matching; Web site; graph theoretic clustering algorithm; group Web page; latent semantic indexing; levenshtein string edit distance; Atmospheric measurements; Clustering algorithms; Navigation; Particle measurements; Prototypes; Web sites; Weight measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Systems Evolution (WSE), 2009 11th IEEE International Symposium on
Conference_Location
Edmonton, AB
ISSN
1550-4441
Print_ISBN
978-1-4244-5124-1
Type
conf
DOI
10.1109/WSE.2009.5631253
Filename
5631253
Link To Document