DocumentCode
3012802
Title
An automated change-detection algorithm for HTML documents based on semantic hierarchies
Author
Lim, Seung-Jin ; Ng, Yiu-Kai
Author_Institution
Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
fYear
2001
fDate
2001
Firstpage
303
Lastpage
312
Abstract
The data at many Web sites is changing rapidly, and a significant amount of this data is presented in HTML documents that consist of markups and data contents. Although XML is becoming more popular for data exchange, the presentation of data contained in XML documents is given, by and large, in the HTML format using XSL(T). Since HTML was designed to “display” data from the human perspective, it is not trivial for a machine to detect (hierarchical) changes of data in an HTML document. In this paper, we propose a heuristic algorithm, called SCD (Semantic Change Detection), to detect semantic changes to the hierarchical data contents in any two HTML documents automatically. Semantic changes differ from syntactic changes since the latter refer to changes of data contents with respect to markup structures according to the HTML grammar. SCD does not require pre-processing, nor any knowledge of the internal structure of the source documents beforehand. The time complexity of SCD is O[(|X|×|Y|)log(|X|×|Y|)], where |X| and |Y| are the number of unique branches in the syntactic hierarchies of any two given HTML documents, respectively
Keywords
computational complexity; hypermedia markup languages; information resources; HTML documents; HTML grammar; SCD algorithm; Web sites; XML documents; XSL(T); changing rapidly data; data contents; data display; data exchange; data presentation; heuristic algorithm; hierarchical data contents; markup structures; semantic change detection algorithm; semantic hierarchies; syntactic hierarchies; time complexity; unique branches; Change detection algorithms; Computer science; Displays; Eyes; HTML; Heuristic algorithms; Humans; Testing; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2001. Proceedings. 17th International Conference on
Conference_Location
Heidelberg
ISSN
1063-6382
Print_ISBN
0-7695-1001-9
Type
conf
DOI
10.1109/ICDE.2001.914842
Filename
914842
Link To Document