DocumentCode :
384092
Title :
Newspaper headlines extraction from microfilm images
Author :
Liu, Qing Hong ; Tan, Chew Lim
Author_Institution :
Nat. Univ. of Singapore, Singapore
Volume :
3
fYear :
2002
fDate :
2002
Firstpage :
208
Abstract :
Automatic indexing is important for a digital library to provide digitized manuscripts of old document images and their electronic text. As an essential step in creating such a system, this paper discusses the issue of extracting headlines from old newspaper microfilms. Most research on document layout analysis has largely assumed relatively clean images. However microfilm images of old newspapers present a challenge. Such images are usually insufficiently illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with these kinds of images. A Run Length Smearing Algorithm (RLSA) is applied in the headline extraction. An experiment shows that our approach has improved the recall, precision and combined rates.
Keywords :
digital libraries; document image processing; indexing; microforms; optical character recognition; Run Length Smearing Algorithm; automatic indexing; digital library; digitized manuscripts; document images; document layout analysis; electronic text; experiment; microfilm images; newspaper headlines extraction; noisy background; optical character recognition; Background noise; Data mining; Graphics; Histograms; Image analysis; Machine assisted indexing; Optical character recognition software; Printing; Software libraries; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2002. Proceedings. 16th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-1695-X
Type :
conf
DOI :
10.1109/ICPR.2002.1047831
Filename :
1047831
Link To Document :
بازگشت