Title of article :
Euler circuits and DNA sequencing by hybridization Original Research Article
Author/Authors :
Richard Arratia، نويسنده , , Béla Bollob?s، نويسنده , , Don Coppersmith، نويسنده , , Gregory B. Sorkin، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2000
Pages :
34
From page :
63
To page :
96
Abstract :
Sequencing by hybridization is a method of reconstructing a long DNA string — that is, figuring out its nucleotide sequence — from knowledge of its short substrings. Unique reconstruction is not always possible, and the goal of this paper is to study the number of reconstructions of a random string. For a given string, the number of reconstructions is determined by the pattern of repeated substrings; in an appropriate limit substrings will occur at most twice, so the pattern of repeats is given by a pairing: a string of length 2n in which each symbol occurs twice. A pairing induces a 2-in, 2-out graph, whose directed edges are defined by successive symbols of the pairing — for example the pairing ABBCAC induces the graph with edges AB, BB, BC, and so forth — and the number of reconstructions is simply the number of Euler circuits in this 2-in, 2-out graph. The original problem is thus transformed into one about pairings: to find the number fk(n) of n-symbol pairings having k Euler circuits. We show how to compute this function, in closed form, for any fixed k, and we present the functions explicitly for k=1,…,9. The key is a decomposition theorem: the Euler “circuit number” of a pairing is the product of the circuit numbers of “component” sub-pairings. These components come from connected components of the “interlace graph”, which has the pairingʹs symbols as vertices, and edges when symbols are “interlaced”. (A and B are interlaced if the pairing has the form ⋯A⋯B⋯A⋯B⋯ or ⋯B⋯A⋯B⋯A⋯.) We carry these results back to the original question about DNA strings, and provide a total variation distance upper bound for the approximation error.
Keywords :
Matrix-tree theorem , Pairing , BEST theorem , Circle graph , Catalan number , Interlace graph , Euler path , DNA sequencing , Euler circuit , Combinatorial enumeration , Hybridization
Journal title :
Discrete Applied Mathematics
Serial Year :
2000
Journal title :
Discrete Applied Mathematics
Record number :
885113
Link To Document :
بازگشت