• DocumentCode
    183467
  • Title

    ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS)

  • Author

    Andreu Sanchez, Joan ; Romero, Veronica ; Toselli, Alejandro Hector ; Vidal, Enrique

  • Author_Institution
    Pattern Recognition & Human Language Technol. Res. Center, Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2014
  • fDate
    1-4 Sept. 2014
  • Firstpage
    785
  • Lastpage
    790
  • Abstract
    A contest on Handwritten Text Recognition organised in the context of the ICFHR 2014 conference is described. Two tracks with increased freedom on the use of training data were proposed and three research groups participated in these two tracks. The handwritten images for this contest were drawn from an English data set which is currently being considered in the Tran scriptorium project. The goal of this project is to develop innovative, efficient and cost-effective solutions for the transcription of historical handwritten document images, focusing on four languages: English, Spanish, German and Dutch. For the English language, the so-called "Bentham collection" is being considered in Tran scriptorium. It encompasses a large set of manuscripts written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). A small subset of this collection has been chosen for the present HTR competition. The selected subset has been written by several hands (Bentham himself and his secretaries) and entails significant variabilities and difficulties regarding the quality of text images and writing styles. Training and test data were provided in the form of carefully segmented line images, along with the corresponding transcripts. The three participants achieved very good results, with transcription word error rates ranging from 15.0% down to 8.6%.
  • Keywords
    document image processing; handwritten character recognition; image segmentation; natural language processing; text analysis; Bentham collection; Dutch language; English language; German language; HTR competition; HTRtS; ICFHR2014 competition; Spanish language; handwritten text recognition; historical handwritten document image transcription; segmented line images; tranScriptorium datasets; Adaptive optics; Artificial neural networks; Hidden Markov models; Histograms; Text recognition; Training; Training data; Handwritten Text Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
  • Conference_Location
    Heraklion
  • ISSN
    2167-6445
  • Print_ISBN
    978-1-4799-4335-7
  • Type

    conf

  • DOI
    10.1109/ICFHR.2014.137
  • Filename
    6981116