DocumentCode :
3141549
Title :
Script line separation from Indian multi-script documents
Author :
Pal, U. ; Chaudhuri, B.B.
Author_Institution :
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
fYear :
1999
fDate :
20-22 Sep 1999
Firstpage :
406
Lastpage :
409
Abstract :
In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other official Indian languages. For OCR of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper, an automatic technique of separating the text lines using script characteristics and shape based features is presented. At present, the system has an overall accuracy of about 98.5%
Keywords :
document image processing; image segmentation; optical character recognition; Devnagari; English; Indian languages; Indian multi-script documents; OCR; document page; script form; script line separation; shape based features; text lines; three-language formula; Character generation; Computer vision; Natural languages; Optical character recognition software; Optical filters; Pattern recognition; Read only memory; Shape; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
Type :
conf
DOI :
10.1109/ICDAR.1999.791810
Filename :
791810
Link To Document :
بازگشت