DocumentCode
827151
Title
Benchmarking attribute selection techniques for discrete class data mining
Author
Hall, Mark A. ; Holmes, Geoffrey
Author_Institution
Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
Volume
15
Issue
6
fYear
2003
Firstpage
1437
Lastpage
1447
Abstract
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant, and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes.
Keywords
Bayes methods; data mining; feature extraction; learning (artificial intelligence); pattern classification; C4.5 learning scheme; attribute ranking; attribute selection technique benchmarking; attribute utility estimation; classification learner; data engineering; discrete class data mining; naive Bayes learning scheme; predictive attribute identification; search; supervised classification; Buildings; Data engineering; Data mining; Decision trees; Fasteners; Phase noise; Predictive models; Reliability engineering; Testing; Training data;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2003.1245283
Filename
1245283
Link To Document