• DocumentCode
    57527
  • Title

    Predicting the Pro-Longevity or Anti-Longevity Effect of Model Organism Genes with New Hierarchical Feature Selection Methods

  • Author

    Cen Wan ; Freitas, Alex A. ; de Magalhaes, Joao Pedro

  • Author_Institution
    Sch. of Comput., Univ. of Kent, Canterbury, UK
  • Volume
    12
  • Issue
    2
  • fYear
    2015
  • fDate
    March-April 2015
  • Firstpage
    262
  • Lastpage
    275
  • Abstract
    Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Naïve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms´ genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Naïve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Naïve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets.
  • Keywords
    Bayes methods; ageing; bioinformatics; data mining; feature selection; genetics; ontologies (artificial intelligence); pattern classification; 1-nearest neighbour classifiers; Caenorhabditis elegans; Drosophila melanogaster; Mus musculus; Saccharomyces cerevisiae; antilongevity effect; classification task; data mining; gene ontology terms; hierarchical feature selection methods; model organism genes; naive Bayes; prolongevity effect; Accuracy; Aging; Bioinformatics; Computational biology; IEEE transactions; Organisms; Testing; 1-nearest neighbour; Ageing; classification; data mining; feature selection; gene ontology; na??ve bayes;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2355218
  • Filename
    6892956