DocumentCode :
1338261
Title :
Efficient Design of Bio-Basis Function to Predict Protein Functional Sites Using Kernel-Based Classifiers
Author :
Maji, Pradipta ; Das, Chandra
Author_Institution :
Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
Volume :
9
Issue :
4
fYear :
2010
Firstpage :
242
Lastpage :
249
Abstract :
In order to apply the powerful kernel-based pattern recognition algorithms such as support vector machines to predict functional sites in proteins, amino acids need encoding prior to input. In this regard, a new string kernel function, termed as the modified bio-basis function, is proposed that maps a nonnumerical sequence space to a numerical feature space. The proposed string kernel function is developed based on the conventional bio-basis function and needs a bio-basis string as a support like conventional kernel function. The concept of zone of influence of a bio-basis string is introduced in the proposed kernel function to take into account the influence of each bio-basis string in nonnumerical sequence space. An efficient method is described to select a set of bio-basis strings for the proposed kernel function, integrating the Fisher ratio and a novel concept of degree of resemblance. The integration enables the method to select a reduced set of relevant and nonredundant bio-basis strings.
Keywords :
bioinformatics; molecular biophysics; molecular configurations; pattern classification; proteins; support vector machines; amino acids encoding; biobasis function design; biobasis string zone of influence; kernel based classifiers; kernel based pattern recognition algorithms; modified biobasis function; nonnumerical sequence space; numerical feature space; protein functional site prediction; string kernel function; support vector machines; Bioinformatics; Biological information theory; Pattern recognition; Sequences; Support vector machines; Bioinformatics; functional site prediction; pattern recognition; sequence analysis; support vector machines; Algorithms; Binding Sites; Computational Biology; Information Storage and Retrieval; Models, Molecular; Neural Networks (Computer); Pattern Recognition, Automated; Protein Binding; Sequence Analysis, Protein;
fLanguage :
English
Journal_Title :
NanoBioscience, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-1241
Type :
jour
DOI :
10.1109/TNB.2010.2080684
Filename :
5587898
Link To Document :
بازگشت