Title :
Newspaper headlines extraction from microfilm images
Author :
Liu, Qing Hong ; Tan, Chew Lim
Author_Institution :
Nat. Univ. of Singapore, Singapore
Abstract :
Automatic indexing is important for a digital library to provide digitized manuscripts of old document images and their electronic text. As an essential step in creating such a system, this paper discusses the issue of extracting headlines from old newspaper microfilms. Most research on document layout analysis has largely assumed relatively clean images. However microfilm images of old newspapers present a challenge. Such images are usually insufficiently illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with these kinds of images. A Run Length Smearing Algorithm (RLSA) is applied in the headline extraction. An experiment shows that our approach has improved the recall, precision and combined rates.
Keywords :
digital libraries; document image processing; indexing; microforms; optical character recognition; Run Length Smearing Algorithm; automatic indexing; digital library; digitized manuscripts; document images; document layout analysis; electronic text; experiment; microfilm images; newspaper headlines extraction; noisy background; optical character recognition; Background noise; Data mining; Graphics; Histograms; Image analysis; Machine assisted indexing; Optical character recognition software; Printing; Software libraries; Text analysis;
Conference_Titel :
Pattern Recognition, 2002. Proceedings. 16th International Conference on
Print_ISBN :
0-7695-1695-X
DOI :
10.1109/ICPR.2002.1047831