DocumentCode :
1583338
Title :
Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line
Author :
Pal, U. ; Chaudhuri, B.B.
Author_Institution :
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
fYear :
2001
fDate :
6/23/1905 12:00:00 AM
Firstpage :
790
Lastpage :
794
Abstract :
In a general situation, a document page may contain several scriptforms. For optical character recognition (OCR) of such a document page, it is necessary to separate the scripts before feeding them to their individual OCR systems. An automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document is proposed. Shape based features, statistical features and some features obtained from the concept of a water reservoir are used for script identification. The proposed scheme has an accuracy of about 97.33%
Keywords :
document image processing; feature extraction; natural languages; optical character recognition; Arabic; Bangla script; Chinese; Devnagari; English; OCR systems; automatic script line identification; automatic technique; document page; optical character recognition; printed Roman text; printed text line identification; script forms; shape based features; statistical features; water reservoir; Computer vision; Fractals; Optical character recognition software; Optical devices; Pattern recognition; Probability; Reservoirs; Shape; Water resources; Water storage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
Type :
conf
DOI :
10.1109/ICDAR.2001.953896
Filename :
953896
Link To Document :
بازگشت