DocumentCode :
2146220
Title :
Text Segmentation of Consumer Magazines in PDF Format
Author :
Fan, Jian
Author_Institution :
Hewlett-Packard Labs., Palo Alto, CA, USA
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
794
Lastpage :
798
Abstract :
Text segmentation is usually the first step taken towards the reuse and repurposing of PDF documents. Through experimental evaluation, we found that the leading text segmentation algorithms have limitations for contemporary consumer magazines. We propose a new local homogeneity measure based on line space, and incorporate this new feature into a region growing algorithm. Using a fixed set of parameters, our algorithm achieved robust performance on PDF magazines with wide-ranging layouts and styles.
Keywords :
document handling; text analysis; PDF document; PDF format; PDF magazine; contemporary consumer magazine; line space; local homogeneity measure; parameter set; region growing algorithm; text segmentation algorithm; Bismuth; Extraterrestrial measurements; Layout; Merging; Portable document format; Rendering (computer graphics); PDF analysis; page segmentation; text segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.163
Filename :
6065420
Link To Document :
بازگشت