مرکز منطقه ای اطلاع رساني علوم و فناوري - DOM-Based Web Pages to Determine the Structure of the Similarity Algorithm

DocumentCode :

2917109

Title :

DOM-Based Web Pages to Determine the Structure of the Similarity Algorithm

Author :

Kang, Chunying

Author_Institution :

Coll. of Inf. Sci. & Technol., Heilongjiang Univ., Harbin, China

Volume :

fYear :

2009

fDate :

21-22 Nov. 2009

Firstpage :

245

Lastpage :

248

Abstract :

Web data is currently mainly in the form of HTML pages, expressed by the HTML language of Web pages through the browser after analysis is only suitable for people to browse, not suitable for data exchange as a way to deal with by a computer. This article will make web page decompound a DOM tree, then from the DOM tree body root node to start, in accordance with the breadth-first traversal order DOM tree, layer by layer comparison DOM node tree, statistics of its changes, and then the sum of all floors of the changes, If less than a certain threshold, it is structurally similar to two pages, otherwise dissimilar. because this algorithm is only concerned about the page structure information without concern for the content of the page, it has a very high operating efficiency, while the algorithm is not limited to a specific web page, with good versatility.

Keywords :

Web sites; document handling; electronic data interchange; hypermedia markup languages; object-oriented programming; online front-ends; tree searching; DOM node tree; DOM-based Web pages; HTML language; HTML pages; Web browser; Web data; breadth-first traversal order DOM tree; data exchange; page structure information; similarity algorithm; Computer displays; Data mining; Educational institutions; HTML; Information analysis; Information technology; Intelligent structures; Internet; Statistics; Web pages; DOM; Similarity Algorithm; Web;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligent Information Technology Application, 2009. IITA 2009. Third International Symposium on

Conference_Location :

Nanchang

Print_ISBN :

978-0-7695-3859-4

Type :

conf

DOI :

10.1109/IITA.2009.218

Filename :

5369421

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2917109