• DocumentCode
    2777232
  • Title

    Towards a DNA sequencing theory (learning a string)

  • Author

    Li, Ming

  • Author_Institution
    Waterloo Univ., Ont., Canada
  • fYear
    1990
  • fDate
    22-24 Oct 1990
  • Firstpage
    125
  • Abstract
    Mathematical frameworks suitable for massive automated DNA sequencing and for analyzing DNA sequencing algorithms are studied under plausible assumptions. The DNA sequencing problem is modeled as learning a superstring from its randomly drawn substrings. Under certain restrictions, this may be viewed as learning a superstring in L.G. Valiant´s (1984) learning model, and in this case the author gives an efficient algorithm for learning a superstring and a quantitative bound on how many samples suffice. A major obstacle to the approach turns out to be a quite well-known open question on how to approximate the shortest common superstring of a set of strings. The author presents the first provably good algorithm that approximates the shortest superstring of length n by a superstring of length O(n log n)
  • Keywords
    DNA; biology computing; learning systems; merging; search problems; DNA sequencing; efficient algorithm; randomly drawn substrings; samples; shortest common superstring; superstring learning; Approximation algorithms; Bioinformatics; DNA; Genomics; Humans; Laboratories; Machine learning; Machine learning algorithms; Postal services; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on
  • Conference_Location
    St. Louis, MO
  • Print_ISBN
    0-8186-2082-X
  • Type

    conf

  • DOI
    10.1109/FSCS.1990.89531
  • Filename
    89531