DocumentCode :
2959576
Title :
Automatic language identification of bilingual English and Farsi scripts
Author :
Rezaee, Hamideh ; Geravanchizadeh, Masoud ; Razzazi, Farbod
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Tabriz, Tabriz, Iran
fYear :
2009
fDate :
14-16 Oct. 2009
Firstpage :
1
Lastpage :
4
Abstract :
In general, printed documents may contain several different languages. Therefore, to use Optical Character Recognition (OCR) for multi-lingual documents, it is necessary to automatically separate these languages. In this paper, we describe a method for identification of printed Farsi and English text from images of documents in line and word levels. The proposed algorithm is developed based on statistical and shape-based features. The accuracy of this method is around 96.05%.
Keywords :
document image processing; optical character recognition; English text idenification; Farsi scripts identification; automatic language identification; document image processing; line level document; optical character recognition; word level document; Character recognition; Distribution functions; Image converters; Image segmentation; Machine vision; Natural languages; Optical character recognition software; Optical filters; Shape; Text recognition; Document Image Processing; Language Identification; Multilingual Scripts; OCR;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Application of Information and Communication Technologies, 2009. AICT 2009. International Conference on
Conference_Location :
Baku
Print_ISBN :
978-1-4244-4739-8
Electronic_ISBN :
978-1-4244-4740-4
Type :
conf
DOI :
10.1109/ICAICT.2009.5372532
Filename :
5372532
Link To Document :
بازگشت