Title :
TreeZip: A New Algorithm for Compressing Large Collections of Evolutionary Trees
Author :
Matthews, Suzanne J. ; Sul, Seung-Jin ; Williams, Tiffani L.
Author_Institution :
Dept. of Comput. Sci. & Eng., Texas A&M Univ., College Station, TX, USA
Abstract :
The primary advantage of TreeZip is its use of semantic compression, which allows us to uniquely store tree relationship information. Phylogenetic trees are stored in a format known as a Newick representation, which uses nested parentheses to represent the evolutionary relationships (or subtrees) within a phylogenetic tree. TreeZip uses two universal hashing functions in order to represent compactly all of the shared evolutionary relationships in the tree collection. In our previous work, we have used successively universal hash functions in our HashCS and HashRF algorithms that build consensus trees and topological distance matrices, respectively. Once the hash table is constructed, the TreeZip then writes the compressed file depicting the information contained in the collection of phylogenetic trees.
Keywords :
cryptography; data compression; evolutionary computation; trees (mathematics); Newick representation; TreeZip; evolutionary trees; file compression; phylogenetic tree; semantic compression; subtrees; universal hashing functions; Compression algorithms; Computer science; Data compression; Data engineering; History; Organisms; Phylogeny; Robustness; Standards development; Testing; compression; evolutionary tree; phylogeny; trees; treezip;
Conference_Titel :
Data Compression Conference (DCC), 2010
Conference_Location :
Snowbird, UT
Print_ISBN :
978-1-4244-6425-8
Electronic_ISBN :
1068-0314
DOI :
10.1109/DCC.2010.64