• DocumentCode
    3422612
  • Title

    Hierarchies in HTML documents: linking text to concepts

  • Author

    Burget, Radek

  • Author_Institution
    Faculty of Inf. Technol., Brno Univ. of Technol., Czech Republic
  • fYear
    2004
  • fDate
    30 Aug.-3 Sept. 2004
  • Firstpage
    186
  • Lastpage
    190
  • Abstract
    For the successful setting of the semantic Web, it is necessary to provide tools for linking the large amounts of data that are currently available in HTML documents to the semantic Web ontologies. Due to the enormous variability of the HTML code, it is very limiting to define direct bindings between patterns of the HTML code and the concepts. We propose an approach based on modeling the visual part of the rendered document and describing the key characteristics of the data presentation in a general way. As a next step, we propose the way for using this model for locating the instances of the concepts in the document using the approximate tree matching algorithms and regular expressions.
  • Keywords
    hypermedia markup languages; ontologies (artificial intelligence); semantic Web; text analysis; tree data structures; very large databases; HTML code; HTML document; data presentation; regular expression; semantic Web ontology; tree matching algorithm; Databases; HTML; Information retrieval; Information technology; Joining processes; Navigation; Ontologies; Semantic Web; Solids; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2004. Proceedings. 15th International Workshop on
  • ISSN
    1529-4188
  • Print_ISBN
    0-7695-2195-9
  • Type

    conf

  • DOI
    10.1109/DEXA.2004.1333471
  • Filename
    1333471