• DocumentCode
    1621239
  • Title

    Reusing of information constructed in HTML documents: A conversion of HTML into OWL

  • Author

    Hwangbo, Hoon ; Lee, Hongchul

  • Author_Institution
    Dept. of Inf. & Manage. Eng., Korea Univ., Seoul
  • fYear
    2008
  • Firstpage
    871
  • Lastpage
    875
  • Abstract
    There have been efforts of making a knowledge based web, represented by Semantic Web. However, in this trend, HTML is not appropriate as a language for ontology and a structure of information. Due to numerous amounts of information in it, it seems rational to reuse those data in HTML. Previous studies are not enough to broadly convert HTML into OWL because they mainly focus on conversions of structured data (table tags), and they just give simple executions. In addition, GRDDL, a recommendation of W3C, needs an additional script for a conversion, and the output format of it is RDF which has some restrictions. This paper will offer three steps of conversions; (1) Extraction of information, (2) Acquiring triples, (3) Constructing ontology. There are two types of information; text-formed and non-text-formed information. In addition, there are two kinds of tags which include only text-formed information or which include both of text-formed and non-text-formed one. Depending on the type of tags, we classify tag categories and set rules for each of them. Using those rules, we can make triples, and finally we can construct ontology.
  • Keywords
    classification; hypermedia markup languages; knowledge representation languages; semantic Web; text analysis; HTML document; OWL; information extraction; information reuse; nontext-formed information; ontology construction; semantic Web; structured data conversion; tag classification; triple acquiring; Data mining; HTML; Image converters; Joining processes; OWL; Ontologies; Paper technology; Resource description framework; Semantic Web; XML; Analyzing system of English grammar; Conversion; Data extraction; HTML; OWL; Reusing information;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control, Automation and Systems, 2008. ICCAS 2008. International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-89-950038-9-3
  • Electronic_ISBN
    978-89-93215-01-4
  • Type

    conf

  • DOI
    10.1109/ICCAS.2008.4694654
  • Filename
    4694654