Towards a DNA sequencing theory (learning a string)

Author

Li, Ming

Author_Institution

Waterloo Univ., Ont., Canada

fYear

1990

fDate

22-24 Oct 1990

Firstpage

125

Abstract

Mathematical frameworks suitable for massive automated DNA sequencing and for analyzing DNA sequencing algorithms are studied under plausible assumptions. The DNA sequencing problem is modeled as learning a superstring from its randomly drawn substrings. Under certain restrictions, this may be viewed as learning a superstring in L.G. Valiant´s (1984) learning model, and in this case the author gives an efficient algorithm for learning a superstring and a quantitative bound on how many samples suffice. A major obstacle to the approach turns out to be a quite well-known open question on how to approximate the shortest common superstring of a set of strings. The author presents the first provably good algorithm that approximates the shortest superstring of length n by a superstring of length O(n log n)

Keywords

DNA; biology computing; learning systems; merging; search problems; DNA sequencing; efficient algorithm; randomly drawn substrings; samples; shortest common superstring; superstring learning; Approximation algorithms; Bioinformatics; DNA; Genomics; Humans; Laboratories; Machine learning; Machine learning algorithms; Postal services; Sequences;

fLanguage

English

Publisher

ieee

Conference_Titel

Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on

Conference_Location

St. Louis, MO

Print_ISBN

0-8186-2082-X

Type

conf

DOI

10.1109/FSCS.1990.89531

Filename

89531