• DocumentCode
    2259760
  • Title

    Comparison between two Arabic tagsets

  • Author

    Rashwan, Mohsen A A ; Khalil, Enas A H ; Rafea, Ahmed

  • Author_Institution
    Dept. of Electron. & Electr. Commun., Cairo Univ, Cairo, Egypt
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Enhancing Arabic tagging is of great importance in many NLP applications. This paper presents a simple comparison tool that compares two powerful tagging systems for Arabic, the first one is the ASVM Tagger, by Diab M. et al,. The second one is RDI Arab Tagger that relies on simple powerful long n-grams probability estimation plus A*search algorithm for disambiguation, this comparison is done to superimpose points of excellence in Arab Tagger into ASVM tagger. From this comparison, mapper tool is implemented to convert from the fine grain Arab tagset (62 tags used by the ArabTagger) to the other course grain compact tagset of 24 tags Reduced Tagset (RTS) used by ASVM-Tagger. A combined system from the output of both is then formed, which gives an average accuracy higher than that of ASVM in our experiment, 95% of hybrid system versus 93% of ASVM system.
  • Keywords
    natural language processing; probability; support vector machines; tree searching; A* search algorithm; ASVM Tagger; ArabTagger; Arabic tagging; NLP applications; RDI Arab Tagger; long n-grams probability estimation; natural language processing; reduced tagset; Application software; Computer science; Data mining; Labeling; Machine learning; Natural languages; Power engineering and energy; Speech processing; Support vector machines; Tagging; A∗search algorithm; Automatic Support Vector Machine (ASVM); N-gram model; Part-of-Speech Tagging (POS); Reduced Tag Set (RTS);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313767
  • Filename
    5313767