• DocumentCode
    18313
  • Title

    Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines

  • Author

    Jian-Sheng Wu ; Zhi-Hua Zhou

  • Author_Institution
    Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • fDate
    May-June 2013
  • Firstpage
    752
  • Lastpage
    759
  • Abstract
    The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (position-specific scoring matrices), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of 26.23 + 2.55% and an AUC value of 0.805 + 0.020 superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.
  • Keywords
    RNA; biochemistry; molecular biophysics; proteins; support vector machines; CS-LapSVM; amino acid sequence; biochemical properties; cost sensitive Laplacian support vector machines; evolutionary information; hybrid feature; microRNA binding residues; mutual interaction propensities; position specific scoring matrices; proteins; semisupervised learning; sequence based prediction; unlabeled data; Amino acids; Laplace equations; Predictive models; Proteins; Standards; Support vector machines; Training; Amino acids; CS-LapSVM; Laplace equations; Laplacian support vector machine; Predictive models; Proteins; RNA; Standards; Support vector machines; Training; amino acid sequence; biochemical properties; biochemistry; cost sensitive Laplacian support vector machines; cost-sensitive learning; evolutionary information; hybrid feature; miRNA-binding residues; microRNA binding residues; molecular biophysics; mutual interaction propensities; position specific scoring matrices; proteins; semisupervised learning; sequence based prediction; support vector machines; unlabeled data;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.75
  • Filename
    6550864