• DocumentCode
    3469230
  • Title

    Discrimination between Arabic and Latin from bilingual documents

  • Author

    Haboubi, Sofiene ; Maddouri, Samia Snoussi ; Amiri, Hamid

  • Author_Institution
    Syst. & Signal Process. Lab., Nat. Eng. Sch. of Tunis, Tunis, Tunisia
  • fYear
    2011
  • fDate
    3-5 March 2011
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    An important task in machine learning is the electronic reading of documents. In this process, discrimination between languages is one of the first steps in the problem of automatic document text recognition. We are interested in the processing of mixed Arabic/Latin printed documents. Our method is based essentially on the extraction of words. We first extract structural features of words and then recognize the writing language. We finally present the results of our classification approach and discuss possible improvements.
  • Keywords
    learning (artificial intelligence); natural language processing; text analysis; Arabic/Latin printed document; automatic document text recognition; bilingual document; electronic document reading; machine learning; structural features extraction; Character recognition; Feature extraction; Gabor filters; IEEE Computer Society; Optical character recognition software; Text analysis; USA Councils; Language identification; structural features; word extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications, Computing and Control Applications (CCCA), 2011 International Conference on
  • Conference_Location
    Hammamet
  • Print_ISBN
    978-1-4244-9795-9
  • Type

    conf

  • DOI
    10.1109/CCCA.2011.6031496
  • Filename
    6031496