• DocumentCode
    3012802
  • Title

    An automated change-detection algorithm for HTML documents based on semantic hierarchies

  • Author

    Lim, Seung-Jin ; Ng, Yiu-Kai

  • Author_Institution
    Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    303
  • Lastpage
    312
  • Abstract
    The data at many Web sites is changing rapidly, and a significant amount of this data is presented in HTML documents that consist of markups and data contents. Although XML is becoming more popular for data exchange, the presentation of data contained in XML documents is given, by and large, in the HTML format using XSL(T). Since HTML was designed to “display” data from the human perspective, it is not trivial for a machine to detect (hierarchical) changes of data in an HTML document. In this paper, we propose a heuristic algorithm, called SCD (Semantic Change Detection), to detect semantic changes to the hierarchical data contents in any two HTML documents automatically. Semantic changes differ from syntactic changes since the latter refer to changes of data contents with respect to markup structures according to the HTML grammar. SCD does not require pre-processing, nor any knowledge of the internal structure of the source documents beforehand. The time complexity of SCD is O[(|X|×|Y|)log(|X|×|Y|)], where |X| and |Y| are the number of unique branches in the syntactic hierarchies of any two given HTML documents, respectively
  • Keywords
    computational complexity; hypermedia markup languages; information resources; HTML documents; HTML grammar; SCD algorithm; Web sites; XML documents; XSL(T); changing rapidly data; data contents; data display; data exchange; data presentation; heuristic algorithm; hierarchical data contents; markup structures; semantic change detection algorithm; semantic hierarchies; syntactic hierarchies; time complexity; unique branches; Change detection algorithms; Computer science; Displays; Eyes; HTML; Heuristic algorithms; Humans; Testing; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2001. Proceedings. 17th International Conference on
  • Conference_Location
    Heidelberg
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1001-9
  • Type

    conf

  • DOI
    10.1109/ICDE.2001.914842
  • Filename
    914842