مرکز منطقه ای اطلاع رساني علوم و فناوري - Monothetic separation of Telugu, Hindi and English text lines from a multi script document

DocumentCode :

2566853

Title :

Monothetic separation of Telugu, Hindi and English text lines from a multi script document

Author :

Padma, M.C. ; Vijaya, P.A.

Author_Institution :

Dept. of E. & C. Eng., Malnad Coll. of Eng., Hassan, India

fYear :

2009

fDate :

11-14 Oct. 2009

Firstpage :

4870

Lastpage :

4875

Abstract :

In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a monothetic algorithmic model to identify and separate text lines Telugu, Hindi and English scripts from a printed multilingual document. The proposed method uses the distinct features of the target script and searches for the text lines that possess the anticipated features. Experimentation conducted involved 1500 text lines for learning and 900 text lines for testing. The performance has turned out to be 98.5%.

Keywords :

document image processing; optical character recognition; text analysis; English text line; monothetic algorithm; monothetic separation; multi script document; multilingual document; optical character recognition; script/language form; Context modeling; Cybernetics; Educational institutions; Feeds; Image analysis; Natural languages; Optical character recognition software; Text analysis; Text recognition; USA Councils; Feature extraction; Monothetic Classifier; Multi-script multi-lingual document; Script Identification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on

Conference_Location :

San Antonio, TX

ISSN :

1062-922X

Print_ISBN :

978-1-4244-2793-2

Electronic_ISBN :

1062-922X

Type :

conf

DOI :

10.1109/ICSMC.2009.5346045

Filename :

5346045

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2566853