Title :
Finding Phylogenetically Informative Genes by Estimating Multispecies Gene Entropy
Author_Institution :
Eastern Michigan Univ., Ypsilanti
Abstract :
Although entropy and relative entropy (K-L distance) are widely applied in many bioinformatics areas, there is no method given to compute the multispecies gene entropy yet. This paper presents the first multispecies gene entropy estimation method from the data mining point of view. In this study, a self-organizing map (SOM) is employed to mine a multispecies gene set to obtain the probability distribution of each gene in the feature space, which is the approximation of its corresponding probability distribution in the original sequence space. The multispecies gene entropy is computed by the probability distribution of a gene in the feature space. The phylogenetic applications of the multispecies gene entropy are investigated in an example of resolving incongruence between gene trees and species trees. It is found that genes with nearest K-L distances to the minimum entropy gene are more likely to be phylogenetically informative. A K-L distance based gene concatenation approach under gene clustering is proposed to resolve the gene tree and species tree problem. Under the same testing dataset, the K-L distance based approach not only avoids the ad-hoc mechanism of the original gene concatenation method but also is easy to extend to other dataset and free from prohibitive phylogenetic computing from large number of taxa.
Keywords :
approximation theory; biology computing; data mining; entropy; genetics; probability; self-organising feature maps; K-L distance approach; approximation theory; bioinformatics; data mining; gene clustering; gene concatenation approach; multispecies gene entropy estimation method; probability distribution; self-organizing map; Bioinformatics; DNA; Data mining; Distributed computing; Entropy; Information analysis; Phylogeny; Probability distribution; Sequences; Testing; Gene entropy; K-L distance; SOM mining;
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
DOI :
10.1109/IJCNN.2006.246938