Author/Authors :
Ebrahimie، Esmaeil نويسنده 1Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran , , Ebrahimi، Mansour نويسنده Bioinformatics Research Group, Green Research Center, Qom University, Qom, Iran , , Rahpayma، Narjes نويسنده Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran ,
Abstract :
The study used various screening techniques, clustering, decision tree and generalized rule induction (association) (GRI) models and molecular phylogenic relationship to search for patterns of halophilicy and to find features contribute to halolysin salt stability. We found that Met was the sole N-terminal amino acid in halolysin proteins, whereas other amino acids found at that position of other proteases and termitase. Eighty-three protein features were shown to be important in feature selection modeling, and just one peer group with an anomaly index of 2.42 declined to 1.87 after being run using only important selected features. The depth of the trees generated by various decision tree models varied from 1 to 5 branches. Compared to datasets without feature selection the number of peer groups in clustering models was reduced significantly (p < 0.05). In most decision tree models, the frequency of Gly - Gly was the most important feature for decision tree rule sets and this feature was used in antecedent to support the rules in most GRI association rules. Significant differences
(p < 0.001) found in charged amino acids between halolysin and other proteins with more Asp and Glu in halolysin proteins, while more hydrophobic residues and aliphatic amino acids were found in other proteases.