DocumentCode :
2015573
Title :
A Generalized Thinning Algorithm for Cursive and Non-Cursive Language Scripts
Author :
Shaikh, Noor Ahmed ; Shaikh, Zubair A.
Author_Institution :
Shah Abdul. Latif Univ., Khairpur
fYear :
2005
fDate :
24-25 Dec. 2005
Firstpage :
1
Lastpage :
4
Abstract :
One of the most crucial phases in the process of text recognition is thinning of characters to a single pixel notation. The success measure of any thinning algorithm lies in its property to retain the original character shape, which are also called unit-width skeletons. No agreed universal thinning algorithm exists to produce character skeletons from different languages, which is a pre-process for all subsequent phases of character recognition such as segmentation, feature extraction, classification, etc. Written natural languages based on their intrinsic properties can be classified as cursive and non-cursive. Thinning algorithms when applied on cursive languages, poses greater complexity due to their distinct non-isolated boundaries and complex character shapes such as in Arabic, Sindhi, Urdu, etc. Such algorithms can easily be extended for parallel implementations. Selecting certain pixel arrangement grid templates over the other pixel patterns for the purpose of generating character skeletons exploits the parallel programming. The success key is in determining the right pixel arrangement grids that can reduce the cost of iterations required to evaluate each pixel for selecting for thinning or ignoring. This paper presents an improved parallel thinning algorithm, which can be easily extended for cursive or non-cursive languages alike by introducing a modified set of preservation rules via pixel arrangement grid templates, making it both robust to noise and speed. Experimental results show its success over cursive languages like Arabic, Sindhi, Urdu and non-cursive languages like English, Chinese and even numerals. Thus making it probably a universal thinning algorithm
Keywords :
computational complexity; image classification; image thinning; natural language processing; optical character recognition; parallel algorithms; OCR; character recognition; classification; cursive language scripts; generalized universal thinning algorithm; improved parallel thinning algorithm; noncursive language scripts; pixel arrangement grid templates; preservation rules; text recognition; unit-width skeletons; Character generation; Character recognition; Costs; Feature extraction; Mesh generation; Natural languages; Parallel programming; Shape measurement; Skeleton; Text recognition; Sindhi OCR; Skeleton; Thinning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
9th International Multitopic Conference, IEEE INMIC 2005
Conference_Location :
Karachi
Print_ISBN :
0-7803-9429-1
Electronic_ISBN :
0-7803-9430-5
Type :
conf
DOI :
10.1109/INMIC.2005.334387
Filename :
4133402
Link To Document :
بازگشت