• DocumentCode
    2861296
  • Title

    Co-training with a Single Natural Feature Set Applied to Email Classification

  • Author

    Chan, Jason ; Koprinska, Irena ; Poon, Josiah

  • Author_Institution
    The University of Sydney, Australia
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    586
  • Lastpage
    589
  • Abstract
    When dealing with information overload from the Internet, such as the classification of Web pages and the filtering of email spam, a new technique called co-training has been shown to be a promising approach to help build more accurate classifiers. Co-training allows classifiers to learn with fewer labelled documents by taking advantage of the more abundant unclassified documents. However, conventional co-training requires the dataset to be described by two disjoint and natural feature sets that are sufficiently redundant. In many practical situations, it is not intuitively obvious how to obtain two natural feature sets. This paper shows that when only a single natural feature set is used, the performance of co-training is beneficial in the application of email classification.
  • Keywords
    Electronic mail; Humans; Information filtering; Information filters; Information technology; Internet; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10135
  • Filename
    1410873