DocumentCode :
594808
Title :
Text/graphic separation using a sparse representation with multi-learned dictionaries
Author :
Thanh-Ha Do ; Tabbone, Salvatore ; Ramos-Terrades, O.
Author_Institution :
Univ. de Lorraine-LORIA, Vandœuvre-lès-Nancy, France
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
689
Lastpage :
692
Abstract :
In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to create a final text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds.
Keywords :
computer graphics; data structures; dictionaries; document image processing; text analysis; graphic category classification; graphical documents; learned dictionaries sequences; multilearned dictionaries; nonoverlapped document patch; post-processing step; reconstruction errors; same-sized patches; sparse representation; text category classification; text region extraction; text-graphic separation; Algorithm design and analysis; Dictionaries; Equations; Graphics; Image reconstruction; Noise; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460228
Link To Document :
بازگشت