مرکز منطقه ای اطلاع رساني علوم و فناوري - Script line separation from Indian multi-script documents

DocumentCode :

3141549

Title :

Script line separation from Indian multi-script documents

Author :

Pal, U. ; Chaudhuri, B.B.

Author_Institution :

Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India

fYear :

1999

fDate :

20-22 Sep 1999

Firstpage :

406

Lastpage :

409

Abstract :

In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other official Indian languages. For OCR of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper, an automatic technique of separating the text lines using script characteristics and shape based features is presented. At present, the system has an overall accuracy of about 98.5%

Keywords :

document image processing; image segmentation; optical character recognition; Devnagari; English; Indian languages; Indian multi-script documents; OCR; document page; script form; script line separation; shape based features; text lines; three-language formula; Character generation; Computer vision; Natural languages; Optical character recognition software; Optical filters; Pattern recognition; Read only memory; Shape; Writing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on

Conference_Location :

Bangalore

Print_ISBN :

0-7695-0318-7

Type :

conf

DOI :

10.1109/ICDAR.1999.791810

Filename :

791810

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3141549