DocumentCode :
2016549
Title :
Unsupervised Newspaper Segmentation Using Language Context
Author :
Furmaniak, Ralph
Author_Institution :
Univ. of Waterloo, Waterloo
Volume :
2
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
1263
Lastpage :
1267
Abstract :
There has been increased interest in digitization of newspaper archives. A major problem that must be solved is that of high accuracy decomposition of the page into its logical structure. In this paper I present an approach that uses a language similarity measure based on OCR results to train geometric layout rules tailored to an arbitrary title. Experiments have shown this approach to be very effective.
Keywords :
image segmentation; information retrieval systems; natural languages; optical character recognition; publishing; unsupervised learning; OCR result; arbitrary title; geometric layout rule training; language similarity measure; newspaper archive digitization; unsupervised newspaper segmentation; Books; Engines; Image processing; Image segmentation; Merging; Optical character recognition software; Testing; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4377118
Filename :
4377118
Link To Document :
بازگشت