DocumentCode
2861296
Title
Co-training with a Single Natural Feature Set Applied to Email Classification
Author
Chan, Jason ; Koprinska, Irena ; Poon, Josiah
Author_Institution
The University of Sydney, Australia
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
586
Lastpage
589
Abstract
When dealing with information overload from the Internet, such as the classification of Web pages and the filtering of email spam, a new technique called co-training has been shown to be a promising approach to help build more accurate classifiers. Co-training allows classifiers to learn with fewer labelled documents by taking advantage of the more abundant unclassified documents. However, conventional co-training requires the dataset to be described by two disjoint and natural feature sets that are sufficiently redundant. In many practical situations, it is not intuitively obvious how to obtain two natural feature sets. This paper shows that when only a single natural feature set is used, the performance of co-training is beneficial in the application of email classification.
Keywords
Electronic mail; Humans; Information filtering; Information filters; Information technology; Internet; Text categorization; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10135
Filename
1410873
Link To Document