DocumentCode
2877894
Title
Using a Competitive Clustering Algorithm to Comprehend Web Applications
Author
De Lucia, Andrea ; Scanniello, Giuseppe ; Tortora, Genoveffa
Author_Institution
Dipt. di Matematica e Informatica, Salerno Univ., Fisciano
fYear
2006
fDate
23-24 Sept. 2006
Firstpage
33
Lastpage
40
Abstract
We propose an approach based on winner takes all, a competitive clustering algorithm, to support the comprehension of static and dynamic Web applications. The process first computes the distances between the Web pages and then identifies similar pages through the winner takes all clustering algorithm. Two different instances of the process are presented to identify similar pages at structural and content level, respectively. The first instance encodes the page structure into a string and then uses the Levenshtein algorithm to achieve the distances between pairs of pages. On the other hand, to group similar pages at content level we use the latent semantic indexing to produce the page representations as vectors in the concept space. The Euclidean distance is then computed between the vectors to achieve the distances between the pages to be given as input to the adopted clustering algorithm. A prototype to automate the identification of group of similar pages has been implemented. The approach and the prototype have been assessed in a case study
Keywords
Web sites; indexing; pattern clustering; Euclidean distance; Levenshtein algorithm; Web application comprehension; Web pages; competitive clustering; latent semantic indexing; winner takes all clustering; Application software; Cloning; Clustering algorithms; Euclidean distance; HTML; Indexing; Prototypes; Reverse engineering; Software prototyping; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Site Evolution, 2006. WSE '06. Eighth IEEE International Symposium on
Conference_Location
Philadelphia, PA
ISSN
1550-4441
Print_ISBN
0-7695-2696-9
Type
conf
DOI
10.1109/WSE.2006.19
Filename
4027204
Link To Document