On similarity codes

Author

Yachkov, Arkadii G D ; Torney, David C.

Author_Institution

Dept. of Probability Theory, Moscow State Univ., Russia

Volume

46

Issue

4

fYear

2000

fDate

7/1/2000 12:00:00 AM

Firstpage

1558

Lastpage

1564

Abstract

We introduce a biologically motivated measure of sequence similarity for quaternary N-sequences, extending Hamming similarity. This measure is the sum over the length of the sequences of “alphabetic” similarities at all positions. Alphabetic similarities are defined, symmetrically, on the Cartesian square of the alphabet. These similarities equal zero whenever the two elements differ. In distinction to Hamming similarity, however, our alphabetic similarities take individual values whenever the two elements are identical. In this correspondence we derive lower and upper bounds on the rate of the corresponding quaternary nonlinear and linear codes called similarity codes and applied to DNA sequences

Keywords

DNA; fractals; linear codes; nonlinear codes; random codes; sequences; Cartesian square; DNA sequences; Hamming similarity; alphabetic similarities; biologically motivated measure; code rate; cross-similarity; lower bound; quaternary N-sequences; quaternary linear codes; quaternary nonlinear codes; random coding bound; self similarity; sequence length; sequence similarity; similarity codes; upper bound; Application software; Computer science; Cryptography; Error correction codes; Information theory; Libraries; Linear code; Web server;

fLanguage

English

Journal_Title

Information Theory, IEEE Transactions on

Publisher

ieee

ISSN

0018-9448

Type

jour

DOI

10.1109/18.850695

Filename

850695