• DocumentCode
    3268626
  • Title

    Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets

  • Author

    Chitraranjan, Charith D. ; Alnemer, Loai ; Al-Azzam, Omar ; Salem, Saeed ; Denton, Anne M. ; Iqbal, Muhammad J. ; Kianian, Shahryar F.

  • Author_Institution
    Dept. of Comput. Sci., North Dakota State Univ. Fargo, Fargo, ND, USA
  • Volume
    2
  • fYear
    2011
  • fDate
    18-21 Dec. 2011
  • Firstpage
    180
  • Lastpage
    185
  • Abstract
    We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.
  • Keywords
    biology computing; proteins; support vector machines; SVM k-gram classifier; amino acid sequence; frequent pattern-based algorithm; frequent substring-based sequence classification; nearest neighbor classifier; pairwise sequence-alignment; protein function dataset; protein subcellular localization; reduced amino acid alphabet; support vector machine; Amino acids; Microorganisms; Prediction algorithms; Predictive models; Proteins; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    978-1-4577-2134-2
  • Type

    conf

  • DOI
    10.1109/ICMLA.2011.71
  • Filename
    6147669