DocumentCode :
463700
Title :
Feature Selection for Pairwise Scoring Kernels with Applications to Protein Subcellular Localization
Author :
Sun-Yuan Kung ; Man-Wai Mak
Author_Institution :
Dept. of Electr. Eng., Princeton Univ., NJ, USA
Volume :
2
fYear :
2007
fDate :
15-20 April 2007
Abstract :
In biological sequence classification, it is common to convert variable-length sequences into fixed-length vectors via pairwise sequence comparison. This pairwise approach, however, can lead to feature vectors with dimension equal to the training set size, causing the curse of dimensionality. This calls for feature selection methods that can weed out irrelevant features to reduce training and recognition time. In this paper, we propose to train an SVM using the full-feature column vectors of a pairwise scoring matrix and select the relevant features based on the support vectors of the SVM. The idea stems from the fact that pairwise scoring matrices are symmetric and support vectors are important for classification. We refer to this approach as vector-index-adaptive SVM (VIA-SVM). We compare VIA-SVM with other feature selection schemes-including SVM-RFE, R-SVM, and a filter method based on symmetric divergence (SD)-in protein subcellular localization. Results show that VIA-SVM is able to automatically bound the number of selected features within a small range. We also found that fusion of VIA-SVM and SD can produce more compact feature subsets without decreasing prediction accuracy, and that while VIA-SVM is superior for large feature-set size, the combination of SD and VIA-SVM performs better at small feature-set size.
Keywords :
biology computing; cellular biophysics; matrix algebra; molecular biophysics; pattern classification; proteins; support vector machines; vectors; biological sequence classification; feature selection methods; filter method; fixed-length vectors; full-feature column vectors; pairwise scoring kernels; pairwise sequence comparison; protein subcellular localization; symmetric divergence; variable-length sequences; vector-index-adaptive SVM; Accuracy; Algorithm design and analysis; Classification algorithms; Databases; Filters; Kernel; Protein engineering; Support vector machine classification; Support vector machines; Symmetric matrices; Feature selection; SVM; kernel methods; pairwise scoring; subcellular localization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.366299
Filename :
4217472
Link To Document :
بازگشت