• DocumentCode
    594792
  • Title

    Business email classification using incremental subspace learning

  • Author

    Min Li ; Youngja Park ; Rui Ma ; He Yuan Huang

  • Author_Institution
    IBM Res. China, China
  • fYear
    2012
  • fDate
    11-15 Nov. 2012
  • Firstpage
    625
  • Lastpage
    628
  • Abstract
    We consider a new text classification task: classifying enterprise email messages into sensitive business topics. The identification of sensitive topics in email messages is important for enterprises to safeguard their critical data such as intellectual properties and trade secrets. We introduce the incremental PCA (Principal Component Analysis) to email representation, which can learn a feature subspace incrementally and effectively to reduce the feature dimensionality. Linear SVM (Support Vector Machine) is then adopted to learn the classification models. We validate our approaches with 5,000 emails extracted from the Enron Email set. Experimental results show that SVM outperforms other classification methods, and the incremental PCA produces a substantial reduction in the processing time and a slight increase in the classification accuracy compared to SVM with all the features.
  • Keywords
    electronic mail; feature extraction; learning (artificial intelligence); pattern classification; principal component analysis; security of data; support vector machines; text analysis; Enron email set; business email classification; classification models; email extraction; email representation; enterprise email message classification; feature dimensionality; feature subspace; incremental PCA; incremental principal component analysis; incremental subspace learning; intellectual properties; linear SVM; linear support vector machine; sensitive business topics; sensitive topic identification; substantial reduction; text classification task; trade secrets; Accuracy; Companies; Electronic mail; Feature extraction; Principal component analysis; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2012 21st International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4673-2216-4
  • Type

    conf

  • Filename
    6460212