DocumentCode :
2716125
Title :
Removing Noises Similar to Dots from Persian Scanned Documents
Author :
Shirali-Shahreza, M. Hassan ; Shirali-Shahreza, Sajad
Author_Institution :
Dept. of Comput. Eng., Yazd Univ., Yazd
Volume :
2
fYear :
2008
fDate :
3-4 Aug. 2008
Firstpage :
313
Lastpage :
317
Abstract :
Nowadays, computer is being used in many aspects of human life. A consequence of computer is electronic documents. Computers cannot understand written documents. So, we need to convert written documents to electronic documents in order to be able to process them with computers. One of the common methods for converting written texts to electronic text is Optical Character Recognition (OCR). A lot of work has been done on English OCR, but Persian/Arabic OCR is still under development.One of the major problems in Persian/Arabic OCR is noise removal. Because dots are very important in Persian and Arabic languages and they are very similar to noises, so noise removal from Persian/Arabic documents is more difficult than Latin documents. In this paper, we propose a new method for removing noises similar to dots from Persian/Arabic printed documents. In this method, the size of the dots is estimated in each region after page segmentation. Then the noises which are similar to dots are removed using the estimated size of the dots. This method is implemented as a part of page segmentation phase and the experimental results are well. Some advantages of our method are high speed and strong resistance to skew.
Keywords :
document image processing; image denoising; image segmentation; natural languages; optical character recognition; Arabic OCR; Arabic languages; English OCR; Latin documents; OCR; Persian languages; Persian scanned documents; dot size estimation; electronic documents; noise removal; optical character recognition; page segmentation; written documents; Character recognition; Communication system control; Consumer electronics; Databases; Engineering management; Humans; Natural languages; Optical character recognition software; Optical noise; Technology management; Noise Removal; Optical Character Recognition (OCR); Pattern Recognition; Persian/Arabic OCR;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing, Communication, Control, and Management, 2008. CCCM '08. ISECS International Colloquium on
Conference_Location :
Guangzhou
Print_ISBN :
978-0-7695-3290-5
Type :
conf
DOI :
10.1109/CCCM.2008.246
Filename :
4609697
Link To Document :
بازگشت