DocumentCode :
84560
Title :
Maximizing Deep Coalescence Cost
Author :
Gorecki, Pawel ; Eulenstein, Oliver
Author_Institution :
Dept. of Math., Inf. & Mech., Univ. of Warsaw, Warsaw, Poland
Volume :
11
Issue :
1
fYear :
2014
fDate :
Jan.-Feb. 2014
Firstpage :
231
Lastpage :
242
Abstract :
The minimizing deep coalescence (MDC) problem seeks a species tree that reconciles the given gene trees with the minimum number of deep coalescence events, called deep coalescence (DC) cost. To better assess MDC species trees we investigate into a basic mathematical property of the DC cost, called the diameter. Given a gene tree, a species tree, and a leaf labeling function that assigns leaf-genes of the gene tree to a leaf-species in the species tree from which they were sampled, the DC cost describes the discordance between the trees caused by deep coalescence events. The diameter of a gene tree and a species tree is the maximum DC cost across all leaf labelings for these trees. We prove fundamental mathematical properties describing precisely these diameters for bijective and general leaf labelings, and present efficient algorithms to compute the diameters and their corresponding leaf labelings. In particular, we describe an optimal, i.e., linear time, algorithm for the bijective case. Finally, in an experimental study we demonstrate that the average diameters between a gene tree and a species tree grow significantly slower than their naive upper bounds, suggesting that our exact bounds can significantly improve on assessing DC costs when using diameters.
Keywords :
bioinformatics; evolution (biological); genetics; trees (mathematics); MDC problem; MDC species tree; basic mathematical property; bijective leaf labelings; deep coalescence cost maximization; diameter computation algorithms; gene tree diameter; gene trees; general leaf labelings; leaf labeling computation algorithms; leaf labeling function; leaf-gene assignment; leaf-species; maximum DC cost; minimizing deep coalescence problem; minimum deep coalescence event number; naive upper bounds; optimal linear time algorithm; species tree diameter; Bioinformatics; Computational biology; Joining processes; Labeling; Phylogeny; Shape; Vegetation; Deep coalescence; bijective leaf labeling; cost function; diameter; gene tree; leaf labeling; species tree; tree reconciliation;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.144
Filename :
6657669
Link To Document :
بازگشت