Title of article :
Information Content of Protein Sequences
Author/Authors :
WEISS، نويسنده , , OLAF and JIMةNEZ-MONTAرO، نويسنده , , MIGUEL A and HERZEL، نويسنده , , HANSPETER، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2000
Pages :
8
From page :
379
To page :
386
Abstract :
The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides.
Journal title :
Journal of Theoretical Biology
Serial Year :
2000
Journal title :
Journal of Theoretical Biology
Record number :
1534439
Link To Document :
بازگشت