• DocumentCode
    3413554
  • Title

    Stemming and similarity measures for Arabic Documents Clustering

  • Author

    Froud, H. ; Benslimane, Rachid ; Lachkar, Abdelhamid ; Ouatik, S.A.

  • Author_Institution
    L.T.T.I, Univ. Sidi Mohamed Ben, Fez, Morocco
  • fYear
    2010
  • fDate
    Sept. 30 2010-Oct. 2 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (TR) systems especially with the rapid growth of the number of online documents present in Arabic language. Document clustering aims to automatically group similar documents in one cluster using different similarity/distance measures. In this paper, we evaluate the impact of the stemming on the Arabic Text Document Clustering with five similarity/distance measures: Euclidean Distance, Cosine Similarity, Jaccard Coefficient, Pearson Correlation Coefficient and Averaged Kullback-Leibler Divergence, for the testing dataset. Our experiments on this latter show that the use of the stemming will not yield good results, but makes the representation of the document smaller and the clustering faster.
  • Keywords
    information retrieval systems; natural languages; pattern clustering; text analysis; Arabic language; Arabic text document clustering; Euclidean distance; Jaccard coefficient; Pearson correlation coefficient; averaged Kullback-Leibler divergence; cosine similarity; information retrieval system; online document; Clustering algorithms; Correlation; Entropy; Euclidean distance; Information retrieval; Testing; Arabic Language; Arabic Text Clustering; Information Retrieval Systems; Similarity Measures; Stemming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    I/V Communications and Mobile Network (ISVC), 2010 5th International Symposium on
  • Conference_Location
    Rabat
  • Print_ISBN
    978-1-4244-5996-4
  • Type

    conf

  • DOI
    10.1109/ISVC.2010.5656417
  • Filename
    5656417