مرکز منطقه ای اطلاع رساني علوم و فناوري - A Generalized Thinning Algorithm for Cursive and Non-Cursive Language Scripts

DocumentCode :

2015573

Title :

A Generalized Thinning Algorithm for Cursive and Non-Cursive Language Scripts

Author :

Shaikh, Noor Ahmed ; Shaikh, Zubair A.

Author_Institution :

Shah Abdul. Latif Univ., Khairpur

fYear :

2005

fDate :

24-25 Dec. 2005

Firstpage :

Lastpage :

Abstract :

One of the most crucial phases in the process of text recognition is thinning of characters to a single pixel notation. The success measure of any thinning algorithm lies in its property to retain the original character shape, which are also called unit-width skeletons. No agreed universal thinning algorithm exists to produce character skeletons from different languages, which is a pre-process for all subsequent phases of character recognition such as segmentation, feature extraction, classification, etc. Written natural languages based on their intrinsic properties can be classified as cursive and non-cursive. Thinning algorithms when applied on cursive languages, poses greater complexity due to their distinct non-isolated boundaries and complex character shapes such as in Arabic, Sindhi, Urdu, etc. Such algorithms can easily be extended for parallel implementations. Selecting certain pixel arrangement grid templates over the other pixel patterns for the purpose of generating character skeletons exploits the parallel programming. The success key is in determining the right pixel arrangement grids that can reduce the cost of iterations required to evaluate each pixel for selecting for thinning or ignoring. This paper presents an improved parallel thinning algorithm, which can be easily extended for cursive or non-cursive languages alike by introducing a modified set of preservation rules via pixel arrangement grid templates, making it both robust to noise and speed. Experimental results show its success over cursive languages like Arabic, Sindhi, Urdu and non-cursive languages like English, Chinese and even numerals. Thus making it probably a universal thinning algorithm

Keywords :

computational complexity; image classification; image thinning; natural language processing; optical character recognition; parallel algorithms; OCR; character recognition; classification; cursive language scripts; generalized universal thinning algorithm; improved parallel thinning algorithm; noncursive language scripts; pixel arrangement grid templates; preservation rules; text recognition; unit-width skeletons; Character generation; Character recognition; Costs; Feature extraction; Mesh generation; Natural languages; Parallel programming; Shape measurement; Skeleton; Text recognition; Sindhi OCR; Skeleton; Thinning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

9th International Multitopic Conference, IEEE INMIC 2005

Conference_Location :

Karachi

Print_ISBN :

0-7803-9429-1

Electronic_ISBN :

0-7803-9430-5

Type :

conf

DOI :

10.1109/INMIC.2005.334387

Filename :

4133402

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2015573