• DocumentCode
    2011334
  • Title

    Towards Semi-supervised Transcription of Handwritten Historical Weather Reports

  • Author

    Richarz, Jan ; Vajda, Szilárd ; Fink, Gernot A.

  • Author_Institution
    Dept. of Comput. Sci., Tech. Univ. Dortmund, Dortmund, Germany
  • fYear
    2012
  • fDate
    27-29 March 2012
  • Firstpage
    180
  • Lastpage
    184
  • Abstract
    This paper addresses the automatic transcription of handwritten documents with a regular tabular structure. A method for extracting machine printed tables from images is proposed, using very little prior knowledge about the document layout. The detected table serves as query for retrieving and fitting a structural template, which is then used to extract handwritten text fields. A semi-supervised learning approach is applied to this fields, aiming at minimizing the human labeling effort for recognizer training. The effectiveness of the proposed approach is demonstrated experimentally on a set of historical weather reports. Compared to using all labels, competitive recognition performance is achieved by labeling only a small fraction of the data, keeping the required human effort very low.
  • Keywords
    feature extraction; geophysics computing; handwritten character recognition; history; image retrieval; learning (artificial intelligence); text analysis; text detection; automatic handwritten document transcription; document layout; handwritten historical weather reports; handwritten text field extraction; human labeling effort minimisation; machine printed table extraction method; query processing; regular tabular structure; semisupervised learning; semisupervised transcription; structural template fitting; structural template retrieval; training recognizer; Handwriting recognition; Humans; Labeling; Meteorology; Principal component analysis; Text analysis; Training; document analysis; handwriting recognition; historical documents; layout analysis; semi-supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
  • Conference_Location
    Gold Cost, QLD
  • Print_ISBN
    978-1-4673-0868-7
  • Type

    conf

  • DOI
    10.1109/DAS.2012.91
  • Filename
    6195359