DocumentCode :
2684726
Title :
Protein is incompressible
Author :
Nevill-manning, Craig G. ; Witten, Ian H.
Author_Institution :
Rutgers Univ., Piscataway, NJ, USA
fYear :
1999
fDate :
29-31 Mar 1999
Firstpage :
257
Lastpage :
266
Abstract :
Life is based on two polymers, DNA and protein, whose properties can be described in a simple text file. It is natural to expect that standard text compression techniques would work on biological sequences as they do on English text. But biological sequences have a fundamentally different structure from linguistic ones, and standard compression schemes exhibit disappointing performance on them. We describe a new approach to compression that takes account of the underlying biochemical principles. This gives rise to a generalization of blending for statistical compressors where every context is used, weighted by its similarity to the current context. Results support what research in bioinformatics has shown, that there is little Markov dependency in protein. This cripples data compression schemes and reduces them to order zero models
Keywords :
DNA; biology computing; data compression; polymers; proteins; statistical analysis; DNA; biochemical principles; biological sequences; blending; context weighting; order zero models; polymers; protein; similarity weighting; statistical compressors; Bioinformatics; Compressors; Computer science; DNA; Data compression; Databases; Genetic mutations; Organisms; Polymers; Proteins; Sampling methods; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-0096-X
Type :
conf
DOI :
10.1109/DCC.1999.755675
Filename :
755675
Link To Document :
بازگشت