DocumentCode :
2143379
Title :
The SCRIBO Module of the Olena Platform: A Free Software Framework for Document Image Analysis
Author :
Lazzara, Guillaume ; Levillain, Roland ; Géraud, Thierry ; Jacquelet, Yann ; Marquegnies, Julien ; Crépin-Leblond, Arthur
Author_Institution :
EPITA R&D Lab., Le Kremlin-Bicetre, France
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
252
Lastpage :
258
Abstract :
Electronic documents are being more and more usable thanks to better and more affordable network, storage and computational facilities. But in order to benefit from computer-aided document management, paper documents must be digitized and analyzed. This task may be challenging at several levels. Data may be of multiple types thus requiring different adapted processing chains. The tools to be developed should also take into account the needs and knowledge of users, ranging from a simple graphical application to a complete programming framework. Finally, the data sets to process may be large. In this paper, we expose a set of features that a Document Image Analysis framework should provide to handle the previous issues. In particular, a good strategy to address both flexibility and efficiency issues is the Generic Programming (GP) paradigm. These ideas are implemented as an open source module, SCRIBO, built on top of Olena, a generic and efficient image processing platform. Our solution features services such as preprocessing filters, text detection, page segmentation and document reconstruction (as XML, PDF or HTML documents). This framework, composed of reusable software components, can be used to create full-fledged graphical applications, small utilities, or processing chains to be integrated into third-party projects.
Keywords :
document image processing; object detection; public domain software; Olena platform; SCRIBO module; computer-aided document management; document reconstruction; electronic document image analysis; free software framework; generic programming paradigm; open source module; page segmentation; preprocessing filters; text detection; Algorithm design and analysis; IP networks; Image reconstruction; Libraries; Portable document format; Programming; Software; Document Image Analysis; Free Software; Generic Programming; Reusability; Software Design;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.59
Filename :
6065314
Link To Document :
بازگشت