DocumentCode :
1594962
Title :
Probing the Randomness of Proteins by Their Subsequence Composition
Author :
Apostolico, Alberto ; Cunial, Fabio
Author_Institution :
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA
fYear :
2009
Firstpage :
173
Lastpage :
182
Abstract :
The quantitative underpinning of the information contents of biosequences represents an elusive goal and yet also an obvious prerequisite to the quantitative modeling and study of biological function and evolution. Previous studies have consistently exposed a tenacious lack of compressibility on behalf of biosequences. This leaves the question open as to what distinguishes them from random strings, the latter being clearly unpalatable to the living cell. This paper assesses the randomness of biosequences in terms on newly introduced parameters that relate to the vocabulary of their (suitably constrained) subsequences rather than their substrings. Results from experiments show the potential of the method in distinguishing a protein sequence from its random reshuffling, as well as in tasks of classification and clustering.
Keywords :
biology computing; evaporation model; evolution (biological); pattern classification; pattern clustering; proteins; biological evolution; biological function; biosequences; classification; clustering; information content; protein sequence; random reshuffling; random string; subsequence composition; vocabulary; Biological information theory; Biological system modeling; DNA; Data compression; Educational institutions; Evolution (biology); Genetics; Organisms; Protein sequence; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2009. DCC '09.
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4244-3753-5
Type :
conf
DOI :
10.1109/DCC.2009.60
Filename :
4976461
Link To Document :
بازگشت