• DocumentCode
    2875309
  • Title

    CCM: A Text Classification Model by Clustering

  • Author

    Nizamani, Sarwat ; Memon, Nasrullah ; Wiil, Uffe Kock ; Karampelas, Panagiotis

  • Author_Institution
    Maersk Mc-Kinney Moller Inst., Univ. of Southern Denmark, Odense, Denmark
  • fYear
    2011
  • fDate
    25-27 July 2011
  • Firstpage
    461
  • Lastpage
    467
  • Abstract
    In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based approach to text classification tasks simplifies the model and at the same time increases the accuracy.
  • Keywords
    electronic mail; information resources; pattern classification; pattern clustering; terrorism; text analysis; CCM; Newsgroups datasets; Reuters-21578; boosting algorithm; cluster based classification model; email detection; terrorism domain email dataset; text classification model; topic categorization; Accuracy; Boosting; Classification algorithms; Clustering algorithms; Electronic mail; Text categorization; Training; Boosting; Classification; Clustering; ID3; K-means; NB; SVM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-1-61284-758-0
  • Electronic_ISBN
    978-0-7695-4375-8
  • Type

    conf

  • DOI
    10.1109/ASONAM.2011.76
  • Filename
    5992615