Author/Authors :
M.V. and Dyachkov، نويسنده , , Arkadii and Torney، نويسنده , , David and Vilenkin، نويسنده , , Pavel and White، نويسنده , , Scott، نويسنده ,
Abstract :
We discuss a general notion of similarity function between two sequences which is based on their common subsequences. This notion arises in some applications of molecular biology [A.G. Dʹyachkov, P.L. Erdos, A.J. Macula, V.V. Rykov, D.C. Torney, C.-S. Tung, P.A. Vilenkin, and P.S. White, Exordium for DNA codes, Journal of Combinatorial Optimization 7 (4) (2003)]. We introduce the concept of similarity codes and study the logarithmic asymptotics for the size of optimal codes. Our mathematical results announced in [A.G. Dʹyachkov, D.C. Torney, P.A. Vilenkin, and P.S.White, On a class of codes for the insertion-deletion metric, Proc. of ISIT–2002, Lausanne, Switzerland, July 2002] correspond to the longest common subsequence (LCS) similarity function [V.I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, J. Soviet Phys.—Doklady, 10, 707–710, 1966] which leads to a special subclass of these codes called reverse-complement (RC) similarity codes. RC codes for additive similarity functions have been studied in previous papers [A.G. Dʹyachkov and D.C. Torney, On similarity codes, IEEE Trans. Inform. Theory 46 (4) (2000) 1558–1564], [A.G. Dʹyachkov, D.C. Torney, P.A. Vilenkin, and P.S. White, Reverse– complement similarity codes for DNA sequences, Proc. of ISIT–2000, Sorrento, Italy, July 2000], [P.A. Vilenkin, Some asymptotic problems of combinatorial coding theory and information theory (in Russian), Ph.D. dissertation, Moscow State University, 2000], [V.V. Rykov, A.J. Macula, C.M.Korzelius, D.C. Engelhart, D.C. Torney, and P.S. White, DNA sequences constructed on the basis of quaternary cyclic codes, Proc. of 4-th World Multiconference on Systemics, Cybernetics and Informatics, Orlando, Florida, USA, July 2000].
Keywords :
sequences , Similarity , DNA Sequences , code distance , rate of codes , insertion-deletion codes , codes , subsequences