Title :
Large-scale attribute selection using wrappers
Author :
Gütlein, Martin ; Frank, Eibe ; Hall, Mark ; Karwath, Andreas
Author_Institution :
Dept. of Comput. Sci., Albert-Ludwigs-Univ. Freiburg, Freiburg
fDate :
March 30 2009-April 2 2009
Abstract :
Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed ldquooptimalrdquo subset size. We show that this technique reduces subset size while maintaining comparable accuracy.
Keywords :
information filters; text analysis; forward selection; high-dimensional datasets; internal cross-validation; large-scale attribute selection; linear forward selection; overfitting; wrappers; Algorithm design and analysis; Computational efficiency; Computer science; Filters; Genetic algorithms; Large-scale systems; Performance evaluation; Runtime; Text categorization; Virtual colonoscopy;
Conference_Titel :
Computational Intelligence and Data Mining, 2009. CIDM '09. IEEE Symposium on
Conference_Location :
Nashville, TN
Print_ISBN :
978-1-4244-2765-9
DOI :
10.1109/CIDM.2009.4938668