• DocumentCode
    2086908
  • Title

    Comparing Dimension Reduction Techniques for Arabic Text Classification Using BPNN Algorithm

  • Author

    Harrag, Fouzi ; El-Qawasmah, Eyas ; Al-Salman, Abdul Malik S.

  • Author_Institution
    Comput. Sci. Dept, Farhat ABBAS Univ., Setif, Algeria
  • fYear
    2010
  • fDate
    5-7 Aug. 2010
  • Firstpage
    6
  • Lastpage
    11
  • Abstract
    Dimensionality reduction is an essential task for many large-scale information processing problems such as classifying document sets, searching over Web data sets, etc. It can be used to improve both the efficiency and the effectiveness of classifiers. In this paper, a comparative study is conducted of five Dimension Reduction Techniques in the context of the Arabic text classification problem using an in house Arabic dataset. We evaluated and compared Stemming, Light-Stemming, Document Frequency (DF), TFIDF and Latent Semantic Indexing (LSI)methods to reduce the feature space into an input space of much lower dimension for the neural network classifier. The results showed that the proposed model was able to achieve high categorization effectiveness as measured by Macro-Average F1 measure. Experiments on Arabic datasets indicate that the DF, TFIDF and LSI techniques are favorable in terms of its effectiveness and efficiency when compared with the two other methods.
  • Keywords
    backpropagation; neural nets; pattern classification; text analysis; Arabic dataset; Arabic text classification; BPNN algorithm; Web data sets; arabic text classification; comparing dimension reduction techniques; neural network; Artificial neural networks; Classification algorithms; Large scale integration; Matrix decomposition; Support vector machine classification; Text categorization; Training; Arabic Text Categorization; Back-Propagation Neural Network; Dimensionality Reduction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Integrated Intelligent Computing (ICIIC), 2010 First International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    978-1-4244-7963-4
  • Electronic_ISBN
    978-0-7695-4152-5
  • Type

    conf

  • DOI
    10.1109/ICIIC.2010.23
  • Filename
    5572644