• DocumentCode
    3487187
  • Title

    Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval

  • Author

    Ahmed, Shehab ; Kise, Kenji ; Iwamura, Mikio ; Liwicki, Marcus ; Dengel, Andreas

  • Author_Institution
    German Res. Center for Artificial Intell. (DFKI), Kaiserslautern, Germany
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    528
  • Lastpage
    532
  • Abstract
    In this paper a novel method for automatic ground truth generation of camera captured document images is proposed. Currently, no dataset is available for camera captured documents. It is very difficult to build these datasets manually, as it is very laborious and costly. The proposed method is fully automatic, allowing building the very large scale (i.e., millions of images) labeled camera captured documents dataset, without any human intervention. Evaluation of samples generated by the proposed approach shows that 99.98% of the images are correctly labeled. Novelty of the proposed approach lies in the use of document image retrieval for automatic labeling, especially for camera captured documents, which contain different distortions specific to camera, e.g., blur, occlusion, perspective distortion, etc.
  • Keywords
    document image processing; image retrieval; automatic ground truth generation; automatic labeling; camera captured document images; document image retrieval; Cameras; Databases; Degradation; Feature extraction; Optical character recognition software; Portable document format; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.111
  • Filename
    6628676