DocumentCode :
3145657
Title :
A simple and effective approach for border noise removal from document images
Author :
Shafait, Faisal ; Breuel, Thomas M.
Author_Institution :
Image Understanding & Pattern Recognition (IUPR) Res. Group, German Res. Center for Artificial Intell. (DFKI GmbH), Kaiserslautern, Germany
fYear :
2009
fDate :
14-15 Dec. 2009
Firstpage :
1
Lastpage :
5
Abstract :
When digitizing bound material like books or magazines, marginal noise appears along the page border. This noise consists of undesired text parts from the neighboring page and/or speckles that result from the binarization process. When a keyword based search is performed in a digitized collection, textual noise in particular poses problems since the returned search results might correspond to textual noise instead of actual contents of the page. Manually removing marginal noise for each page is not feasible in large scale digitization projects. In this paper, we present a simple and effective approach for removing both textual and non-textual noise by finding borders of noise regions using projection profile analysis. We demonstrate the effectiveness of our approach by evaluating it quantitatively on the widely used University of Washington (UW3) dataset. The results show that our approach reduces the noise ratio from 70% to 20% while retaining more than 99% of actual page contents. Comparison with state-of-the-art approaches shows that our algorithm performs comparable to them, while being simple to understand and easy to implement. We also provide an open source implementation of our method as part of the OCRopus OCR system.
Keywords :
document image processing; image denoising; binarization process; border noise removal; digitized collection; document image; keyword based search; marginal noise removal; nontextual noise removal; projection profile analysis; Bars; Books; Character recognition; Cleaning; Filters; Image analysis; Optical character recognition software; Optical noise; Pattern recognition; Signal to noise ratio;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multitopic Conference, 2009. INMIC 2009. IEEE 13th International
Conference_Location :
Islamabad
Print_ISBN :
978-1-4244-4872-2
Electronic_ISBN :
978-1-4244-4873-9
Type :
conf
DOI :
10.1109/INMIC.2009.5383115
Filename :
5383115
Link To Document :
بازگشت