DocumentCode :
2014710
Title :
A Data Mining Approach to Reading Order Detection
Author :
Ceci, Michelangelo ; Berardi, Margherita ; Porcelli, Giuseppe A. ; Malerba, Donato
Author_Institution :
Univ. of Bari, Bari
Volume :
2
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
924
Lastpage :
928
Abstract :
Determining the reading order for layout components extracted from a document image can be a crucial problem for several applications. It enables the reconstruction of a single textual element from texts associated to multiple layout components and makes both information extraction and content-based retrieval of documents more effective. A common aspect for all methods reported in the literature is that they strongly depend on the specific domain and are scarcely reusable when the classes of documents or the task at hand changes. In this paper, we investigate the problem of detecting the reading order of layout components by resorting to a data mining approach which acquires the domain specific knowledge from a set of training examples. The input of the learning method is the description of the "chains" of layout components defined by the user. Only spatial information is exploited to describe a chain, thus making the proposed approach also applicable to the cases in which no text can be associated to a layout component. The method induces a probabilistic classifier based on the Bayesian framework which is used for reconstructing either single or multiple chains of layout components. It has been evaluated on a set of document images.
Keywords :
Bayes methods; content-based retrieval; data mining; document image processing; learning (artificial intelligence); pattern classification; probability; Bayesian framework; content-based retrieval; data mining approach; document image; domain specific knowledge; information extraction; layout components; learning method; probabilistic classifier; reading order detection; Bayesian methods; Content based retrieval; Data mining; Encoding; Image recognition; Image reconstruction; Information retrieval; Labeling; Learning systems; Predictive models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4377050
Filename :
4377050
Link To Document :
بازگشت