• DocumentCode
    2990001
  • Title

    Designing predictors of halophilic and non-halophilic proteins using support vector machines

  • Author

    Hui-Ling Huang ; Srinivasulu, Yerukala Sathipati ; Charoenkwan, Phasit ; Hua-Chin Lee ; Shinn-Ying Ho

  • Author_Institution
    Dept. of Biol. Sci. & Technol., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2013
  • fDate
    16-19 April 2013
  • Firstpage
    230
  • Lastpage
    237
  • Abstract
    Finding the molecular features causes the halophilicity in the halostable organisms is helpful to understand the halophilic adaption. In this study, we proposed a prediction method for halophilic proteins by using a machine learning method. The stages of this study are six-fold. First, we establish a non-redundant dataset of the halophilic proteins, collected from NCBI, Uniprotkb and EMBL-EBI databases. The dataset consists of 245 positive and negative proteins with sequence identity <;25%. Second, the protein sequences are represented by three types of feature vector sets which include amino acid composition, dipeptide composition, and physicochemical properties. Third, we propose three classifiers based on support vector machine (SVM) to classify the halophilic proteins and non-halophilic proteins. Fourth, the independent test accuracies of the three efficient classifiers are larger than 83%. Fifth, an inheritable biobjective combinatory genetic algorithm is utilized to select a set of 11 physicochemical properties (PCPs). Sixth, these abundant amino acids, high different dipeptides (amino acid pair) and 11 informative PCPs can support to analyze the halophilic and non-halophilic proteins.
  • Keywords
    biology computing; genetic algorithms; learning (artificial intelligence); molecular biophysics; proteins; support vector machines; EMBL-EBI database; NCBI database; PCP; SVM; Uniprotkb database; amino acid composition; amino acid pair; amino acids; biobjective combinatory genetic algorithm; designing predictors; dipeptide composition; dipeptides; feature vector sets; halophilic adaption; halophilicity; halostable organisms; machine learning method; molecular features; negative proteins; nonhalophilic proteins; nonredundant dataset; physicochemical property; positive protein; prediction method; protein sequences; support vector machines; Accuracy; Amino acids; Bioinformatics; Genetic algorithms; Organisms; Proteins; Support vector machines; Genetic algorithms; Halophilic proteins; Physicochemical properties; SVM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2013 IEEE Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/CIBCB.2013.6595414
  • Filename
    6595414