DocumentCode :
3132908
Title :
An empirical study on keyword-based Web site clustering
Author :
Ricca, Filippo ; Tonella, Paolo ; Girardi, Christian ; Pianta, Emanuele
Author_Institution :
Centro per la Ricerca Scientifica e Tecnologica, ICT, Povo, Italy
fYear :
2004
fDate :
24-26 June 2004
Firstpage :
204
Lastpage :
213
Abstract :
Web site evolution is characterized by a limited support to the understanding activities offered to the developers. In fact, design diagrams are often missing or outdated. A potentially interesting option is to reverse engineer high level views of Web sites from the content of the Web pages. Clustering is a valuable technique that can be used in this respect. Web pages can be clustered together based on the similarity of summary information about their content, represented as a list of automatically extracted keywords. This work presents an empirical study that was conducted to determine the meaningfulness for Web developers of clusters automatically produced from the analysis of the Web page content. Natural language processing (NLP) plays a central role in content analysis and keyword extraction. Thus, a second objective of the study was to assess the contribution of some shallow NLP techniques to the clustering task.
Keywords :
Web sites; content management; natural languages; reverse engineering; Web pages; Web site clustering; Web site evolution; Web sites; content analysis; design diagrams; keyword extraction; natural language processing; reverse engineering; Clustering algorithms; Clustering methods; Conferences; Data mining; Information resources; Natural language processing; Navigation; Reverse engineering; Web page design; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Program Comprehension, 2004. Proceedings. 12th IEEE International Workshop on
ISSN :
1092-8138
Print_ISBN :
0-7695-2149-5
Type :
conf
DOI :
10.1109/WPC.2004.1311062
Filename :
1311062
Link To Document :
بازگشت