DocumentCode :
3685838
Title :
Comparison of unsupervised feature selection methods for high-dimensional regression problems in prediction of peptide binding affinity
Author :
Ferdi Sarac;Volkan Uslan;Huseyin Seker;Ahmed Bouridane
Author_Institution :
Faculty of Engineering and Environment, The University of Northumbria at Newcastle, Newcastle-upon-Tyne, The United Kingdom
fYear :
2015
Firstpage :
8173
Lastpage :
8176
Abstract :
Identification of robust set of predictive features is one of the most important steps in the construction of clustering, classification and regression models from many thousands of features. Although there have been various attempts to select predictive feature sets from high-dimensional data sets in classification and clustering, there is a limited attempt to study it in regression problems. As semi-supervised and supervised feature selection methods tend to identify noisy features in addition to discriminative variables, unsupervised feature selection methods (USFSMs) are generally regarded as more unbiased approach. Therefore, in this study, along with the entire feature set, four different USFSMs are considered for the quantitative prediction of peptide binding affinities being one of the most challenging post-genome regression problems of very high-dimension comparted to extremely small size of samples. As USFSMs are independent of any predictive method, support vector regression was then utilised to assess the quality of prediction. Given three different peptide binding affinity data sets, the results suggest that the regression performance of USFMs depends generally on the datasets. There is no particular method that yields the best performance compared to their performances in the classification problems. However, a closer investigation of the results appears to suggest that the spectral regression-based approach yields slightly better performance. To the best of our knowledge, this is the first study that presents comprehensive comparison of USFSMs in such high-dimensional regression problems, particularly in biological domain with an application in the prediction of peptide binding affinity, and provides a number of practical suggestions for future practitioners.
Keywords :
"Peptides","Predictive models","Support vector machines","Noise measurement","Clustering algorithms","Prediction algorithms"
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE
ISSN :
1094-687X
Electronic_ISBN :
1558-4615
Type :
conf
DOI :
10.1109/EMBC.2015.7320291
Filename :
7320291
Link To Document :
بازگشت