Title :
Discrimination between Arabic and Latin from bilingual documents
Author :
Haboubi, Sofiene ; Maddouri, Samia Snoussi ; Amiri, Hamid
Author_Institution :
Syst. & Signal Process. Lab., Nat. Eng. Sch. of Tunis, Tunis, Tunisia
Abstract :
An important task in machine learning is the electronic reading of documents. In this process, discrimination between languages is one of the first steps in the problem of automatic document text recognition. We are interested in the processing of mixed Arabic/Latin printed documents. Our method is based essentially on the extraction of words. We first extract structural features of words and then recognize the writing language. We finally present the results of our classification approach and discuss possible improvements.
Keywords :
learning (artificial intelligence); natural language processing; text analysis; Arabic/Latin printed document; automatic document text recognition; bilingual document; electronic document reading; machine learning; structural features extraction; Character recognition; Feature extraction; Gabor filters; IEEE Computer Society; Optical character recognition software; Text analysis; USA Councils; Language identification; structural features; word extraction;
Conference_Titel :
Communications, Computing and Control Applications (CCCA), 2011 International Conference on
Conference_Location :
Hammamet
Print_ISBN :
978-1-4244-9795-9
DOI :
10.1109/CCCA.2011.6031496