DocumentCode :
2472140
Title :
A robust front page detection algorithm for large periodical collections
Author :
Konya, Iuliu ; Seibert, Christoph ; Glahn, Sebastian ; Eickeler, Stefan
Author_Institution :
Schloss Birlinghoven, Fraunhofer Inst. for Intell. Anal. & Inf. Syst. (IAIS), Sankt Augustin, Germany
fYear :
2008
fDate :
8-11 Dec. 2008
Firstpage :
1
Lastpage :
5
Abstract :
Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automatic segmentation of the document stream into issues heavily influence the overall output quality of a document image analysis system. As a solution to the issue segmentation problem, this paper introduces a robust, two-step front page detection algorithm. First, the salient connected components from the front page of the periodical are described using a multi-dimensional Gaussian distribution based on discrete cosine transform (DCT) features. Second, a graph model is computed by applying Delaunay triangulation on the selected set of components. A specialized, error-tolerant graph matching algorithm is used to compute the distance score between the model and each candidate page. Experiments on a large, real-world newspaper data set demonstrate the generality and effectiveness of the proposed method.
Keywords :
Gaussian distribution; discrete cosine transforms; document image processing; error correction; graph theory; image matching; image segmentation; mesh generation; object detection; DCT feature; Delaunay triangulation; automatic segmentation problem; discrete cosine transform feature; document image analysis system; document stream; error-correcting subgraph isomorphism algorithm; error-tolerant graph matching algorithm; graph model; large-scale digitization project; multidimensional Gaussian distribution; periodical collection; robust front page detection algorithm; salient connected component; Detection algorithms; Discrete cosine transforms; Gaussian distribution; Image analysis; Image segmentation; Pattern recognition; Robustness; Shape; Streaming media; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
ISSN :
1051-4651
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2008.4760966
Filename :
4760966
Link To Document :
بازگشت