DocumentCode
1340745
Title
Subcellular Localization Prediction through Boosting Association Rules
Author
Yongwook Yoon ; Lee, G.G.
Author_Institution
Dept. of Comput. Sci. & Eng., Pohang Univ. of Sci. & Technol. (POSTECH), Pohang, South Korea
Volume
9
Issue
2
fYear
2012
Firstpage
609
Lastpage
618
Abstract
Computational methods for predicting protein subcellular localization have used various types of features, including N-terminal sorting signals, amino acid compositions, and text annotations from protein databases. Our approach does not use biological knowledge such as the sorting signals or homologues, but use just protein sequence information. The method divides a protein sequence into short k-mer sequence fragments which can be mapped to word features in document classification. A large number of class association rules are mined from the protein sequence examples that range from the N-terminus to the C-terminus. Then, a boosting algorithm is applied to those rules to build up a final classifier. Experimental results using benchmark data sets show that our method is excellent in terms of both the classification performance and the test coverage. The result also implies that the k-mer sequence features which determine subcellular locations do not necessarily exist in specific positions of a protein sequence. Online prediction service implementing our method is available at http://isoft.postech.ac.kr/research/BCAR/subcell.
Keywords
bioinformatics; data mining; molecular biophysics; pattern recognition; proteins; proteomics; N-terminal sorting signals; amino acid compositions; boosting association rules; class association rules; document classification; k-mer sequence fragments; pattern recognition; protein databases; protein sequence information; protein subcellular localization; text annotations; Accuracy; Amino acids; Association rules; Boosting; Databases; Proteins; Training; Clustering classification and association rules; bioinformatics (genome or protein) databases; pattern recognition.; Animals; Cluster Analysis; Computational Biology; Databases, Protein; Intracellular Space; Pattern Recognition, Automated; Plant Proteins; Proteins; Sequence Analysis, Protein; Support Vector Machines;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2011.131
Filename
6035673
Link To Document