• DocumentCode
    1638739
  • Title

    OCD: An Optimized and Canonical Document Format

  • Author

    Bloechle, Jean-Luc ; Lalanne, Denis ; Ingold, Rolf

  • Author_Institution
    Dept. of Inf., Univ. of Fribourg, Fribourg, Switzerland
  • fYear
    2009
  • Firstpage
    236
  • Lastpage
    240
  • Abstract
    Revealing and being able to manipulate the structured content of PDF documents is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we present OCD, an optimized, easy-to-process and canonical format for representing structured electronic documents. The system and methods used for reverse engineering PDF documents into the OCD format are presented as well as the techniques to optimize it. We finally expose concrete evaluations of our OCD format compactness and restructuring performances.
  • Keywords
    document handling; optimisation; reverse engineering; OCD format; PDF document; canonical document format; reverse engineering; structured electronic document representation; Concrete; Informatics; Labeling; Optimization methods; Performance evaluation; Reverse engineering; Speech synthesis; Standards publication; Text analysis; Text processing; OCD; PDF; XCDF; XML; logical structure; physical structure; reverse-engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.159
  • Filename
    5277720