DocumentCode
3469230
Title
Discrimination between Arabic and Latin from bilingual documents
Author
Haboubi, Sofiene ; Maddouri, Samia Snoussi ; Amiri, Hamid
Author_Institution
Syst. & Signal Process. Lab., Nat. Eng. Sch. of Tunis, Tunis, Tunisia
fYear
2011
fDate
3-5 March 2011
Firstpage
1
Lastpage
6
Abstract
An important task in machine learning is the electronic reading of documents. In this process, discrimination between languages is one of the first steps in the problem of automatic document text recognition. We are interested in the processing of mixed Arabic/Latin printed documents. Our method is based essentially on the extraction of words. We first extract structural features of words and then recognize the writing language. We finally present the results of our classification approach and discuss possible improvements.
Keywords
learning (artificial intelligence); natural language processing; text analysis; Arabic/Latin printed document; automatic document text recognition; bilingual document; electronic document reading; machine learning; structural features extraction; Character recognition; Feature extraction; Gabor filters; IEEE Computer Society; Optical character recognition software; Text analysis; USA Councils; Language identification; structural features; word extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Communications, Computing and Control Applications (CCCA), 2011 International Conference on
Conference_Location
Hammamet
Print_ISBN
978-1-4244-9795-9
Type
conf
DOI
10.1109/CCCA.2011.6031496
Filename
6031496
Link To Document