Title :
A Skew Resistant Method for Persian Text Segmentation
Author :
Shirali-Shahreza, Sajad ; Manzuri-Shalmani, M.T. ; Shirali-Shahreza, M. Hassan
Author_Institution :
Dept. of Comput. Eng., Sharif Univ. of Technol., Tehran
Abstract :
Using OCR programs is one of the best ways to convert written and printed documents into digital form. The first phase in OCR is segmenting the input image and identifying text and non-text regions. This paper proposes a new method for segmentation of Persian printed texts which is based on the ink spread effect. Considering that the Persian scripts are very different from the English script, most methods proposed for the English script have not rendered good results for the Persian scripts. The method proposed in this paper has been designed considering the special features of the Persian scripts. In addition, one of the most important characteristics of this method is resistance to skew. Moreover, the proposed approach is directly applicable to Arabic scripts
Keywords :
document image processing; feature extraction; image segmentation; optical character recognition; text analysis; Arabic scripts; Persian document; Persian scripts; Persian text segmentation; image segmentation; ink spread effect; optical character recognition; page segmentation; printed documents; skew resistant method; text identification; written documents; Character recognition; Computational intelligence; Design methodology; Gray-scale; Image segmentation; Ink; Optical character recognition software; Optical signal processing; Signal processing; Signal processing algorithms; Ink Spread Effect; Optical Character Recognition (OCR); Page Segmentation; Persian Document;
Conference_Titel :
Computational Intelligence in Image and Signal Processing, 2007. CIISP 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0707-9
DOI :
10.1109/CIISP.2007.369303