مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line

DocumentCode :

1583338

Title :

Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line

Author :

Pal, U. ; Chaudhuri, B.B.

Author_Institution :

Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India

fYear :

2001

fDate :

6/23/1905 12:00:00 AM

Firstpage :

790

Lastpage :

794

Abstract :

In a general situation, a document page may contain several scriptforms. For optical character recognition (OCR) of such a document page, it is necessary to separate the scripts before feeding them to their individual OCR systems. An automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document is proposed. Shape based features, statistical features and some features obtained from the concept of a water reservoir are used for script identification. The proposed scheme has an accuracy of about 97.33%

Keywords :

document image processing; feature extraction; natural languages; optical character recognition; Arabic; Bangla script; Chinese; Devnagari; English; OCR systems; automatic script line identification; automatic technique; document page; optical character recognition; printed Roman text; printed text line identification; script forms; shape based features; statistical features; water reservoir; Computer vision; Fractals; Optical character recognition software; Optical devices; Pattern recognition; Probability; Reservoirs; Shape; Water resources; Water storage;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on

Conference_Location :

Seattle, WA

Print_ISBN :

0-7695-1263-1

Type :

conf

DOI :

10.1109/ICDAR.2001.953896

Filename :

953896

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1583338