• DocumentCode
    3394625
  • Title

    ProLoc-rGO: Using rule-based knowledge with Gene Ontology terms for prediction of protein subnuclear localization

  • Author

    Huang, Wen-Lin ; Tung, Chun-Wei ; Ho, Shih-Wen ; Ho, Shinn-Ying

  • Author_Institution
    Dept. of Manage. Inf. Syst., Chin Min Inst. of Technol., Miaoli
  • fYear
    2008
  • fDate
    15-17 Sept. 2008
  • Firstpage
    201
  • Lastpage
    206
  • Abstract
    Gene ontology (GO) annotation is a controlled vocabulary of terms and phrases describing the function of genes and gene products, which has been succeeded in predicting subcellular and subnuclear localization. Generally, each gene product is annotated by very few GO terms from more than 25,000 annotations available at present. How to represent a protein sequence using GO terms as features plays an important role in designing prediction systems for protein subnuclear localization. Our previous work ProLoc-GO can select a small number m out of a large number n GO terms, where m Lt n. However, its off-line time for training is large up to several days even though running on high speedily PC clusters. Therefore, this study proposes an efficient system (ProLoc-rGO) by using the decision tree method to speedily mine m informative GO terms and acquire interpretable rule-based knowledge for predicting subnuclear localization. The ProLoc-rGO performing on SNL9_80 (714 proteins in nine compartments with <80 identity) can mine m=17 informative GO terms, 17 interpretable rules and yield training and test accuracies of 84.9% and 78.2%. For comparison, an accuracy 82.6% (Matthews correlation coefficient (MCC) = 0.711) for ProLoc-rGO performed on SNL9_80 (714 proteins in nine compartments with <80 identity) is obtained, which is better than 67.4% (MCC = 0.50) for Nuc-PLoc that fuses the pseudo-amino acid composition of a protein and its position-specific scoring matrix.
  • Keywords
    bioinformatics; decision trees; genetics; molecular biophysics; ontologies (artificial intelligence); proteins; Matthews correlation coefficient; ProLoc-rGO system; decision tree method; gene ontology annotation; protein sequence; protein subnuclear localization; pseudoamino acid composition; Bioinformatics; Clustering algorithms; DNA; Decision trees; Genomics; Ontologies; Protein sequence; Sequences; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
  • Conference_Location
    Sun Valley, ID
  • Print_ISBN
    978-1-4244-1778-0
  • Electronic_ISBN
    978-1-4244-1779-7
  • Type

    conf

  • DOI
    10.1109/CIBCB.2008.4675779
  • Filename
    4675779