• DocumentCode
    3695256
  • Title

    Supporting early contextualization of textual content in digital documents on the Web

  • Author

    Bahaa Eldesouky;Menna Bakry;Heiko Maus;Andreas Dengel

  • Author_Institution
    Knowledge Management Department, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
  • fYear
    2015
  • Firstpage
    1071
  • Lastpage
    1075
  • Abstract
    The World Wide Web is arguably the most important source of digital documents nowadays. These documents mainly consist of unstructured and semi-structured data comprising a wealth of information at the disposal of the DAR (Document Analysis and Recognition) community. Contextualization plays an important role in understanding the content of those documents. In this paper, we present an approach to early contextualization of textual data in HTML documents. It combines automatic as well as semi-automatic annotation of named entities with user interaction to support contextualization of the content of digital documents as early as in the authoring stage of their life cycle. We also present the results of an online experimental evaluation involving 120 human test subjects. They show that our approach successfully managed to produce semantically annotated versions of unstructured textual content, which contain reliable contextual information, thus facilitating the task of later document analysis stages.
  • Keywords
    "Text analysis","Semantics","Reliability","Information services","Electronic publishing","Internet","Blogs"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333926
  • Filename
    7333926