DocumentCode :
3487515
Title :
Document Classification and Page Stream Segmentation for Digital Mailroom Applications
Author :
Gordo, Albert ; Rusinol, Marcal ; Karatzas, Dimosthenis ; Bagdanov, Andrew D.
Author_Institution :
Dept. Cienc. de la Computacio, Univ. Autonoma de Barcelona, Bellaterra, Spain
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
621
Lastpage :
625
Abstract :
In this paper we present a method for the segmentation of continuous page streams into multipage documents and the simultaneous classification of the resulting documents. We first present an approach to combine the multiple pages of a document into a single feature vector that represents the whole document. Despite its simplicity and low computational cost, the proposed representation yields results comparable to more complex methods in multipage document classification tasks. We then exploit this representation in the context of page stream segmentation. The most plausible segmentation of a page stream into a sequence of multipage documents is obtained by optimizing a statistical model that represents the probability of each segmented multipage document belonging to a particular class. Experimental results are reported on a large sample of real administrative multipage documents.
Keywords :
computational complexity; document image processing; electronic mail; feature extraction; image classification; image segmentation; probability; statistical analysis; administrative multipage documents; computational cost; continuous page stream segmentation; digital mailroom applications; document classification; document representation; feature vector; multipage document classification tasks; multipage document sequence; plausible page stream segmentation; probability; statistical model; Histograms; Image segmentation; Probabilistic logic; Semantics; Training; Vectors; Visualization; Digital Mailroom; Document Classification; Multipage Document Segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.128
Filename :
6628693
Link To Document :
بازگشت