DocumentCode
981116
Title
PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM
Author
Mak, Man-Wai ; Guo, Jian ; Kung, Sun-Yuan
Author_Institution
Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong
Volume
5
Issue
3
fYear
2008
Firstpage
416
Lastpage
422
Abstract
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method - PairProSVM - to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST, and the pairwise profile alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino acid compositions even if most of the homologous sequences have been removed. PairProSVM was evaluated on Huang and Li´s and Gardy et al.´s protein data sets. The overall accuracies on these data sets reach 75.3 percent and 91.9 percent, respectively, which are higher than or comparable to those obtained by sequence alignment and composition-based methods.
Keywords
biochemistry; biology computing; cellular biophysics; genetics; molecular biophysics; pattern classification; proteins; support vector machines; Gardy et al.´s protein data set; Huang-and-Li´s protein data sets; PSI-BLAST; PairProSVM; amino acid compositions; functional annotations; homologous sequence alignment; local pairwise profile alignment; pairwise profile alignment scores; protein sequence profiles; protein subcellular localization method; proteomics research; support vector machine classifier; Kernel Methods; Mercer condition; Subcellular localization; Support Vector Machines; profile alignment; Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Alignment; Sequence Analysis, Protein; Software; Structure-Activity Relationship; Subcellular Fractions; Tissue Distribution;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2007.70256
Filename
4384576
Link To Document