DocumentCode
2114468
Title
Improving the prediction of sub-cellular locations of proteins with a particle swarm optimization-based boosting strategy
Author
Garcia-Lopez, S. ; Jaramillo-Garzon, Jorge Alberto ; Castellanos-Dominguez, German
Author_Institution
Grupo de Control y Procesamiento Digital de Senales, Univ. Nac. de Colombia, Manizales, Colombia
fYear
2012
fDate
Aug. 28 2012-Sept. 1 2012
Firstpage
6313
Lastpage
6316
Abstract
Learning from imbalanced data sets presents an important challenge to the machine learning community. Traditional classification methods, seeking to minimize the overall error rate of the whole training set, do not perform well on imbalanced data since they assume a relatively balanced class distribution and put too much strength on the majority class. This is a common scenario when predicting sub-cellular locations of proteins since proteins belonging to certain specific locations are naturally more abundant or have been more extensively studied. In this work, a new method to learn from imbalanced data, called SwarmBoost, is proposed in order to reduce overlapping and noise of imbalanced datasets and improve prediction performances. The method combines oversampling, subsampling based on particle swarm optimization and ensemble methods. Our results show that SwarmBoost equals and in several cases outperforms other common boosting algorithms like DataBoost-Im and AdaBoost, constituting a useful tool for improving sub-cellular location predictions.
Keywords
biology computing; cellular biophysics; learning (artificial intelligence); molecular biophysics; optimisation; proteins; AdaBoost; DataBoost-Im; SwarmBoost; boosting algorithms; ensemble methods; imbalanced data set learning; machine learning community; noise reduction; oversampling; particle swarm optimization-based boosting strategy; protein subcellular locations; relatively balanced class distribution; traditional classification methods; whole training set; Boosting; Measurement; Particle swarm optimization; Prediction algorithms; Proteins; Training; Algorithms; Proteins; Subcellular Fractions;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE
Conference_Location
San Diego, CA
ISSN
1557-170X
Print_ISBN
978-1-4244-4119-8
Electronic_ISBN
1557-170X
Type
conf
DOI
10.1109/EMBC.2012.6347437
Filename
6347437
Link To Document