Title :
On complexity measures for biological sequences
Author :
Nan, Fei ; Adjeroh, Donald
Author_Institution :
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
Abstract :
In this work, we perform an empirical study of different published measures of complexity for general sequences, to determine their effectiveness in dealing with biological sequences. By effectiveness, we refer to how closely the given complexity measure is able to identify known biologically relevant relationships, such as closeness on a phylogenic tree. In particular, we study three complexity measures, namely, the traditional Shanon´s entropy, linguistic complexity, and T-complexity. For each complexity measure, we construct the complexity profile for each sequence in our test set, and based on the profiles we compare the sequences using different performance measures based on: (i) the information theoretic divergence measure of relative entropy; (ii) apparent periodicity in the complexity profile; and (iii) correct phylogeny. The preliminary results show that the T-complexity was the least effective in identifying previously established known associations between the sequences in our test set. Shannon´s entropy and linguistic-complexity provided better results, with Shannon´s entropy having an upper hand.
Keywords :
biology computing; computational complexity; computational linguistics; entropy; Shanon entropy; T-complexity; biological sequences; complexity measures; information theoretic divergence measure; linguistic complexity; phylogenic tree; relative entropy; Bioinformatics; Biological information theory; Computer science; DNA; Entropy; Genomics; Organisms; Particle measurements; Sequences; Testing;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
DOI :
10.1109/CSB.2004.1332483