Title :
Active sampling for feature selection
Author :
Veeramachaneni, Sriharsha ; Avesani, Paolo
Author_Institution :
ITC-IRST, Trento, Italy
Abstract :
In knowledge discovery applications, where new features are to be added, an acquisition policy can help select the features to be acquired based on their relevance and the cost of extraction. This can be posed as a feature selection problem where the feature values are not known in advance. We propose a technique to actively sample the feature values with the ultimate goal of choosing between alternative candidate features with minimum sampling cost. Our heuristic algorithm is based on extracting candidate features in a region of the instance space where the feature value is likely to alter our knowledge the most. An experimental evaluation on a standard database shows that it is possible outperform a random subsampling policy in terms of the accuracy in feature selection.
Keywords :
data mining; feature extraction; sampling methods; active sampling; feature extraction; feature selection; heuristic algorithm; knowledge acquisition; knowledge discovery; Agriculture; Costs; Data acquisition; Data analysis; Diseases; Electronic mail; Feature extraction; Heuristic algorithms; Sampling methods; Spatial databases;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1251003