• DocumentCode
    600217
  • Title

    Weirdness Coefficient as a Feature Selection Method for Arabic Special Domain Text Classification

  • Author

    Al-Thubaity, Abdulmohsen ; Alanazi, Ayidh ; Hazzaa, I. ; Al-Tuwaijri, Haya

  • Author_Institution
    Comput. Res. Inst., King Abdulaziz City for Sci. & Technol., Riyadh, Saudi Arabia
  • fYear
    2012
  • fDate
    13-15 Nov. 2012
  • Firstpage
    69
  • Lastpage
    72
  • Abstract
    Given the importance of organizing and managing the rapid growth in knowledge of Arabic electronic content, this study introduces the Weirdness Coefficient (W) as a new feature selection method for Arabic special domain text classification. The proposed method was used to classify a dataset comprising five Islamic topics using Naive base (NB) and K-nearest neighbor (K-NN) classifiers, and three representation schemas. The results were also compared with a well-known feature selection method, Chi-squared. In addition to its simplicity in computation, the Weirdness Coefficient showed promising classification accuracy.
  • Keywords
    pattern classification; text analysis; Arabic electronic content; Arabic special domain text classification; Islamic topics; K-NN; K-nearest neighbor classifiers; NB; Naïve base classifiers; feature selection method; weirdness coefficient; Accuracy; Classification algorithms; Computers; Educational institutions; Electronic mail; Niobium; Text categorization; Arabic text classification; K-NN; NB; Weirdness Coefficient; feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2012 International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4673-6113-2
  • Electronic_ISBN
    978-0-7695-4886-9
  • Type

    conf

  • DOI
    10.1109/IALP.2012.64
  • Filename
    6473698