مرکز منطقه ای اطلاع رساني علوم و فناوري - An approach to identify duplicated web pages

DocumentCode :

2415606

Title :

An approach to identify duplicated web pages

Author :

Lucca, Giuseppe Antonio Di ; Penta, Massirniliano Di ; Fasolino, Anna Rita

Author_Institution :

Dipt. di Informatica e Sistemistica, Universita di Napoli Federico II, Naples, Italy

fYear :

2002

fDate :

2002

Firstpage :

481

Lastpage :

486

Abstract :

A relevant consequence of the expansion of the web and e-commerce is the growth of the demand of new web sites and web applications. As a result, web sites and applications are usually developed without a formalized process, and web pages are directly coded in an incremental way, where new pages are obtained by duplicating existing ones. Duplicated web pages, having the same structure and just differing for the data they include, can be considered as clones. The identification of clones may reduce the effort devoted to test, maintain and evolve web sites and applications. Moreover, clone detection among different web sites aims to detect cases of possible plagiarism. In this paper we propose an approach. based on similarity metrics, to detect duplicated pages in web sites and applications, implemented with HTML language and ASP technology. The proposed approach has been assessed by analyzing several web sites and Web applications. The obtained results are reported in the paper with respect to some case studies.

Keywords :

Internet; electronic commerce; information resources; software metrics; clone detection; duplication; software metrics; source code clones; web engineering; web page metrics; web site analysis; Application software; Application specific processors; Cloning; HTML; Plagiarism; Software metrics; Software testing; Time to market; US Department of Transportation; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Software and Applications Conference, 2002. COMPSAC 2002. Proceedings. 26th Annual International

ISSN :

0730-3157

Print_ISBN :

0-7695-1727-7

Type :

conf

DOI :

10.1109/CMPSAC.2002.1045051

Filename :

1045051

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2415606