• DocumentCode
    2530513
  • Title

    Machine learning methods for automatically processing historical documents: from paper acquisition to XML transformation

  • Author

    Esposito, F. ; Malerba, D. ; Semeraro, G. ; Ferilli, S. ; Altamura, O. ; Basile, T. M A ; Berardi, M. ; Ceci, M. ; Di Mauro, N.

  • Author_Institution
    Dipt. di Informatica, Bari Univ., Italy
  • fYear
    2004
  • fDate
    2004
  • Firstpage
    328
  • Lastpage
    335
  • Abstract
    One of the aims of the EU project COLLATE is to design and implement a Web-based collaboratory for archives, scientists and end-users working with digitized cultural material. Since the originals of such a material are often unique and scattered in various archives, severe problems arise for their wide fruition. A solution would be to develop intelligent document processing tools that automatically transform printed documents into a Web-accessible form such as XML. Here, we propose the use of a document processing system, WISDOM++, which uses heavily machine learning techniques in order to perform such a task, and report promising results obtained in preliminary experiments.
  • Keywords
    XML; digital libraries; document handling; history; learning (artificial intelligence); records management; COLLATE EU project; WISDOM++ document processing system; Web-based collaboratory; XML transformation; automatic historical document processing tools; digitized cultural material; machine learning; paper acquisition; Collaborative work; Cultural differences; Image sequence analysis; Layout; Learning systems; Optical character recognition software; Scattering; Software libraries; Text analysis; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
  • Print_ISBN
    0-7695-2088-X
  • Type

    conf

  • DOI
    10.1109/DIAL.2004.1263262
  • Filename
    1263262