Author_Institution :
Inst. of Bioinformatics, Nat. Yang-Ming Univ., Taipei, Taiwan
Abstract :
RNA-binding proteins play many essential roles in the regulation of gene expression. In the cell, mRNA molecules and their precursors are always bound by proteins. RNA binding protein that are involved in RNA processing, cellular localization, gene expression, regulation, transcription and translation have been identified, and structural domains involved in RNA recognition have been described (Siomi et al., 1997; Cusack et al. 1999; Stefl et al. 2005). RNA-binding proteins are an extremely diverse group of proteins, reflecting the different functional requirements of different types of RNA molecules (Andreev et al. 2004). However, despite their obvious functional importance, the specific mechanisms of protein RNA interactions are still poorly understand. Identification of the most putative RNA-binding residues in these proteins is an important and challenging problem of molecular recognition. Despite the significant increase in the number of structures for RNA-protein complexes in the last few years, the molecular basis of specificity remains unclear even for the best studied protein families. Very few studies (Jeong et al. 2004) have been addressed so far to the important problem of predicting RNA-interacting sites in the protein as a critical goal in the field of molecular recognition. We have developed a method for identification of RNA-binding residues using machine learning approaches. Several machine learning techniques and feature selection methods based on protein composition, sequence, charge and structural information have been investigated for their classification accuracy. In detailed cross validation analysis using non redundant RNA-protein complexes deposited in the protein data bank, our method shows satisfactory performance for identification of RNA-binding residues.
Keywords :
biology computing; chemistry computing; genetics; learning (artificial intelligence); macromolecules; proteins; RNA processing; RNA recognition; RNA-binding protein residues; RNA-binding residue identification; RNA-interacting sites; cellular localization; cross validation analysis; feature selection; gene expression; gene regulation; gene transcription; gene translation; mRNA molecules; machine learning; molecular recognition; nonredundant RNA-protein complexes; protein RNA interactions; protein charge; protein composition; protein data bank; protein sequence; protein structural information; putative RNA-binding residues; Bioinformatics; Gene expression; Genomics; Machine learning; Neural networks; Performance analysis; Proteins; RNA; Regulators; Spectroscopy;