DocumentCode :
2414987
Title :
Feature selection for graph kernels
Author :
Tan, Mehmet ; Polat, Faruk ; Alhajj, Reda
Author_Institution :
Dept. of Comp. Eng., TOBB Univ. of Econ. & Technol., Ankara, Turkey
fYear :
2010
fDate :
18-21 Dec. 2010
Firstpage :
632
Lastpage :
637
Abstract :
Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.
Keywords :
bioinformatics; data handling; pattern classification; trees (mathematics); bioinformatics; classifier performance; feature set; graph classification; graph kernel feature selection; kernel methods; masking procedure; small molecule classification; subtree subset effects; subtrees; Bioinformatics; Chemical compounds; Compounds; Data mining; Kernel; Particle separators; Schedules; bioinformatics; cheminformatics; classification; feature selection; graph kernels;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-8306-8
Electronic_ISBN :
978-1-4244-8307-5
Type :
conf
DOI :
10.1109/BIBM.2010.5706643
Filename :
5706643
Link To Document :
بازگشت