• DocumentCode
    981116
  • Title

    PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM

  • Author

    Mak, Man-Wai ; Guo, Jian ; Kung, Sun-Yuan

  • Author_Institution
    Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong
  • Volume
    5
  • Issue
    3
  • fYear
    2008
  • Firstpage
    416
  • Lastpage
    422
  • Abstract
    The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method - PairProSVM - to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST, and the pairwise profile alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino acid compositions even if most of the homologous sequences have been removed. PairProSVM was evaluated on Huang and Li´s and Gardy et al.´s protein data sets. The overall accuracies on these data sets reach 75.3 percent and 91.9 percent, respectively, which are higher than or comparable to those obtained by sequence alignment and composition-based methods.
  • Keywords
    biochemistry; biology computing; cellular biophysics; genetics; molecular biophysics; pattern classification; proteins; support vector machines; Gardy et al.´s protein data set; Huang-and-Li´s protein data sets; PSI-BLAST; PairProSVM; amino acid compositions; functional annotations; homologous sequence alignment; local pairwise profile alignment; pairwise profile alignment scores; protein sequence profiles; protein subcellular localization method; proteomics research; support vector machine classifier; Kernel Methods; Mercer condition; Subcellular localization; Support Vector Machines; profile alignment; Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Alignment; Sequence Analysis, Protein; Software; Structure-Activity Relationship; Subcellular Fractions; Tissue Distribution;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.70256
  • Filename
    4384576