• DocumentCode
    2053018
  • Title

    Publishing Historical Texts on the Semantic Web - A Case Study

  • Author

    Ahonen, Eeva ; Hyvönen, Eero

  • Author_Institution
    Semantic Comput. Res. Group (SeCo), Helsinki Univ. of Technol. (TKK), Helsinki, Finland
  • fYear
    2009
  • fDate
    14-16 Sept. 2009
  • Firstpage
    167
  • Lastpage
    173
  • Abstract
    Historical texts are an important component of cultural heritage, and are being digitized and published on the web in various portals for the researchers and the public. However, searching and linking them with related contents is challenging due to the non-structured text form, digitization errors, and the differences and variations between old and modern language, including historical names (e.g. places), used for querying. This paper addresses these issues by presenting an approach and a system for publishing old texts on the semantic web. As a case study, an existing historical newspaper archive on the web is considered. In our model, semantic metadata is added to the text using automated concept extraction methods. Search is implemented with semantic techniques, by creating a multi-faceted search interface for the text materials. Problems due to OCR errors and spelling variants are addressed with a fuzzy string matching algorithm trying to guess corresponding words in a lexicon, and giving suggestions for corrected word forms. References between texts in the library as well as links between the library and external knowledge sources are formed by using shared ontologies for semantic annotations.
  • Keywords
    digital libraries; electronic publishing; history; library automation; meta data; optical character recognition; semantic Web; string matching; OCR errors; automated text extraction methods; cultural heritage; digitization errors; fuzzy string matching algorithm; historical newspaper archive; historical texts publishing; multi-faceted search interface; nontextured text; optical character recognition; semantic Web; semantic annotations; semantic metadata; shared ontologies; Cultural differences; Error correction; Joining processes; Ontologies; Optical character recognition software; Optical materials; Portals; Publishing; Semantic Web; Software libraries; automatic semantic annotation; historical newspapers; multi-faceted search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2009. ICSC '09. IEEE International Conference on
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-4962-0
  • Electronic_ISBN
    978-0-7695-3800-6
  • Type

    conf

  • DOI
    10.1109/ICSC.2009.9
  • Filename
    5298609