Title :
Predicting the Pro-Longevity or Anti-Longevity Effect of Model Organism Genes with New Hierarchical Feature Selection Methods
Author :
Cen Wan ; Freitas, Alex A. ; de Magalhaes, Joao Pedro
Author_Institution :
Sch. of Comput., Univ. of Kent, Canterbury, UK
Abstract :
Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Naïve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms´ genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Naïve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Naïve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets.
Keywords :
Bayes methods; ageing; bioinformatics; data mining; feature selection; genetics; ontologies (artificial intelligence); pattern classification; 1-nearest neighbour classifiers; Caenorhabditis elegans; Drosophila melanogaster; Mus musculus; Saccharomyces cerevisiae; antilongevity effect; classification task; data mining; gene ontology terms; hierarchical feature selection methods; model organism genes; naive Bayes; prolongevity effect; Accuracy; Aging; Bioinformatics; Computational biology; IEEE transactions; Organisms; Testing; 1-nearest neighbour; Ageing; classification; data mining; feature selection; gene ontology; na??ve bayes;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2014.2355218