DocumentCode :
26836
Title :
Heterogeneous Compression of Large Collections of Evolutionary Trees
Author :
Matthews, Suzanne J.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., United States Mil. Acad., West Point, NY, USA
Volume :
12
Issue :
4
fYear :
2015
fDate :
July-Aug. 1 2015
Firstpage :
807
Lastpage :
814
Abstract :
Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
Keywords :
biology computing; evolution (biological); genetic algorithms; genetics; trees (mathematics); TRZ file; TreeZip compressed file; average space savings; common evolutionary relationships; computational phylogenetics; domain-based compression algorithm; general-purpose compression; heterogeneous collection compression; heterogeneous data; heterogeneous tree collection; large evolutionary trees collections; Bioinformatics; Compression algorithms; Computational biology; IEEE transactions; Phylogeny; Special issues and sections; Collections; Compression; Heterogeneity; Heterogeneous; Phylogeny; TreeZip; Trees; collections; compression; heterogeneity; heterogeneous; trees;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2014.2366756
Filename :
6945876
Link To Document :
بازگشت