• DocumentCode
    1799743
  • Title

    Towards Designing an Email Classification System Using Multi-view Based Semi-supervised Learning

  • Author

    Wenjuan Li ; Weizhi Meng ; Zhiyuan Tan ; Yang Xiang

  • Author_Institution
    Dept. of Comput. Sci., City Univ. of Hong Kong, Hong Kong, China
  • fYear
    2014
  • fDate
    24-26 Sept. 2014
  • Firstpage
    174
  • Lastpage
    181
  • Abstract
    The goal of email classification is to classify user emails into spam and legitimate ones. Many supervised learning algorithms have been invented in this domain to accomplish the task, and these algorithms require a large number of labeled training data. However, data labeling is a labor intensive task and requires in-depth domain knowledge. Thus, only a very small proportion of the data can be labeled in practice. This bottleneck greatly degrades the effectiveness of supervised email classification systems. In order to address this problem, in this work, we first identify some critical issues regarding supervised machine learning-based email classification. Then we propose an effective classification model based on multi-view disagreement-based semi-supervised learning. The motivation behind the attempt of using multi-view and semi-supervised learning is that multi-view can provide richer information for classification, which is often ignored by literature, and semi-supervised learning supplies with the capability of coping with labeled and unlabeled data. In the evaluation, we demonstrate that the multi-view data can improve the email classification than using a single view data, and that the proposed model working with our algorithm can achieve better performance as compared to the existing similar algorithms.
  • Keywords
    learning (artificial intelligence); pattern classification; unsolicited e-mail; classification model; email classification system; labeled data; multiview data; multiview disagreement-based semisupervised learning; single view data; spam; unlabeled data; Data models; Electronic mail; Feature extraction; Semisupervised learning; Supervised learning; Support vector machines; Training; Email Classification; Machine Learning Applications; Multi-View; Network Security; Semi-Supervised Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/TrustCom.2014.26
  • Filename
    7011248