Title :
Persian/Arabic Text Font Estimation using Dots
Author :
Shirali-Shahreza, Mohammad Hassan ; Shirali-Shahreza, Sajad
Author_Institution :
Dept. of Comput. Eng., Yazd Univ.
Abstract :
Nowadays, computer is being used in many aspects of human life. A consequence of computer is electronic documents. Computers can´t understand written documents. So, we need to convert written documents to electronic documents in order to be able to process them with computers. One of the common methods for converting written texts to electronic text is optical character recognition (OCR). A lot of work has been done on English OCR, but Persian/Arabic OCR is still under development. A phase which commonly used in recognition part of an OCR system is estimating font size of text. Usually when the font size of text is found, the pen width is calculated. The pen width can be used for character segmentation in Persian/Arabic OCR. A common way for estimating font size and pen width is using projection profile. In this paper, we introduce a new method for estimating font size of Persian/Arabic printed texts. This method uses dots in text to estimate font size. Because Persian/Arabic texts have a lot of dots, this method can estimate font size precisely. One of the main advantages of our method is its strong resistance to skew
Keywords :
optical character recognition; text analysis; Arabic text font estimation; OCR; Persian text font estimation; dots; electronic documents; optical character recognition; pen width; Character recognition; Consumer electronics; Databases; Humans; Information technology; Life estimation; Natural languages; Optical character recognition software; Signal processing; US Department of Transportation;
Conference_Titel :
Signal Processing and Information Technology, 2006 IEEE International Symposium on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9753-3
Electronic_ISBN :
0-7803-9754-1
DOI :
10.1109/ISSPIT.2006.270838