• DocumentCode
    379036
  • Title

    An ontology-based HTML to XML conversion using intelligent agents

  • Author

    Potok, Thomas E. ; Elmore, Mark T. ; Reed, Joel W. ; Samatova, Nagiza F.

  • Author_Institution
    Comput. Sci. & Math. Div., Oak Ridge Nat. Lab., TN, USA
  • fYear
    2002
  • fDate
    7-10 Jan. 2002
  • Firstpage
    1220
  • Lastpage
    1229
  • Abstract
    How to organize and classify large amounts of heterogeneous information accessible over the Internet is a major problem faced by industry, government, and military organizations. XML is clearly a potential solution to this problem, however, a significant challenge is how to automatically convert information currently expressed in a standard HTML format to an XML format. Within the Virtual Information Processing Agent Research (VIPAR) project, we have developed a process using Internet ontologies and intelligent software agents to perform automatic HTML to XML conversion for Internet newspapers. The VIPAR software is based on a number of significant research breakthroughs. Most notably, the ability for intelligent agents to use a flexible RDF ontology to transform HTML documents to XML tagged documents. The VIPAR system is currently deployed at the USA Pacific Command, Camp Smith, HI, traversing up to 17 Internet newspapers daily.
  • Keywords
    classification; electronic data interchange; electronic publishing; hypermedia markup languages; information resources; software agents; HTML to XML conversion; Internet newspapers; VIPAR project; Virtual Information Processing Agent Research project; flexible RDF ontology; intelligent software agents; tagged documents; Defense industry; Government; HTML; Information processing; Intelligent agent; Internet; Ontologies; Resource description framework; Software agents; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Sciences, 2002. HICSS. Proceedings of the 35th Annual Hawaii International Conference on
  • Print_ISBN
    0-7695-1435-9
  • Type

    conf

  • DOI
    10.1109/HICSS.2002.994071
  • Filename
    994071