DocumentCode :
2189592
Title :
TreeZip: A New Algorithm for Compressing Large Collections of Evolutionary Trees
Author :
Matthews, Suzanne J. ; Sul, Seung-Jin ; Williams, Tiffani L.
Author_Institution :
Dept. of Comput. Sci. & Eng., Texas A&M Univ., College Station, TX, USA
fYear :
2010
fDate :
24-26 March 2010
Firstpage :
544
Lastpage :
544
Abstract :
The primary advantage of TreeZip is its use of semantic compression, which allows us to uniquely store tree relationship information. Phylogenetic trees are stored in a format known as a Newick representation, which uses nested parentheses to represent the evolutionary relationships (or subtrees) within a phylogenetic tree. TreeZip uses two universal hashing functions in order to represent compactly all of the shared evolutionary relationships in the tree collection. In our previous work, we have used successively universal hash functions in our HashCS and HashRF algorithms that build consensus trees and topological distance matrices, respectively. Once the hash table is constructed, the TreeZip then writes the compressed file depicting the information contained in the collection of phylogenetic trees.
Keywords :
cryptography; data compression; evolutionary computation; trees (mathematics); Newick representation; TreeZip; evolutionary trees; file compression; phylogenetic tree; semantic compression; subtrees; universal hashing functions; Compression algorithms; Computer science; Data compression; Data engineering; History; Organisms; Phylogeny; Robustness; Standards development; Testing; compression; evolutionary tree; phylogeny; trees; treezip;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2010
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4244-6425-8
Electronic_ISBN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2010.64
Filename :
5453499
Link To Document :
بازگشت