• DocumentCode
    2015691
  • Title

    Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents

  • Author

    Esposito, Floriana ; Ferilli, Stefano ; Mauro, Nicola Di ; Basile, Teresa M A

  • Author_Institution
    Univ. degli Studi di Bari, Bari
  • Volume
    2
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    1093
  • Lastpage
    1097
  • Abstract
    Organizing large repositories spread throughout the most diverse Web sites rises the problem of effective storage and efficient retrieval of documents. This can be obtained by selectively extracting from them the significant textual information, contained in peculiar layout components, that in turn depend on the identification of the correct document class. The continuous flow of new and different documents in a weakly structured environment like the Web calls for in- crementality, as the ability to continuously update or revise a faulty knowledge previously acquired, while the need to express structural relations among layout components suggest the exploitation of a powerful and symbolic representation language. This paper proposes the application of incremental first-order logic learning techniques in the document layout preprocessing steps, supported by good results obtained in experiments on a real dataset.
  • Keywords
    Web sites; formal logic; information retrieval; learning (artificial intelligence); text analysis; Web documents; Web sites; automatic annotations; first order logic theories; incremental learning; layout components; symbolic representation language; textual information; Automatic logic units; Data mining; Fault diagnosis; Indexing; Information retrieval; Learning systems; Ontologies; Organizing; Software libraries; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4377084
  • Filename
    4377084