• DocumentCode
    2089075
  • Title

    Prediction of Protein Functional Sites Using Novel String Kernels

  • Author

    Das, Chandra ; Maji, Pradipta

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Netaji Subhash Eng. Coll., Kolkata, India
  • fYear
    2008
  • fDate
    17-20 Dec. 2008
  • Firstpage
    127
  • Lastpage
    130
  • Abstract
    In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.
  • Keywords
    biology computing; matrix algebra; pattern recognition; proteins; Fisher ratio; amino acid mutation matrix; amino acids; bio-bases selection method; bio-basis function; biological distance; nonnumerical sequence space; novel string kernels; numerical feature space; pattern recognition algorithms; protein data sets; protein functional sites; Amino acids; Biological information theory; Biological system modeling; Computer science; Educational institutions; Encoding; Genetic mutations; Information technology; Kernel; Protein engineering; Bioinformatics; Fisher ratio; Pattern recognition; Sequence analysis; Support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, 2008. ICIT '08. International Conference on
  • Conference_Location
    Bhubaneswar
  • Print_ISBN
    978-1-4244-3745-0
  • Type

    conf

  • DOI
    10.1109/ICIT.2008.11
  • Filename
    4731312