مرکز منطقه ای اطلاع رساني علوم و فناوري

Title of article :

Information Content of Protein Sequences

Author/Authors :

WEISS، نويسنده , , OLAF and JIMةNEZ-MONTAرO، نويسنده , , MIGUEL A and HERZEL، نويسنده , , HANSPETER، نويسنده ,

Issue Information :

روزنامه با شماره پیاپی سال 2000

Pages :

From page :

379

To page :

386

Abstract :

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides.

Journal title :

Journal of Theoretical Biology

Serial Year :

2000

Journal title :

Journal of Theoretical Biology

Record number :

1534439

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=1534439