• DocumentCode
    1052111
  • Title

    Data management and analysis for high-throughput DNA sequencing projects

  • Author

    Kerlavage, A.R. ; FitzHugh, Will ; Gladek, A. ; Kelley, John ; Scott, John ; Shirley, Robert ; Sutton, Granger ; Wai-Chiu, Man ; White, Owen ; Adams, David

  • Author_Institution
    Dept. of Bioinf., Inst. for Genomic Res., Gaithersburg, MD, USA
  • Volume
    14
  • Issue
    6
  • fYear
    1995
  • Firstpage
    710
  • Lastpage
    717
  • Abstract
    The rapid advances in molecular biology have begun to shift many of the bottlenecks in genome research from the laboratory to the data analysis facility. The pace at which this has occurred creates a situation in which software development always has to catch up with the flow of data. Since such large-scale processes were not anticipated, the analysis infrastructure has not been fully established. Furthermore, most systems that have been built were designed by the biologists who collected the data. More recently, computer scientists, mathematicians, and engineers have taken an interest in this problem. This has had a positive effect, since it has created a tight synergy between the informatics and the biology. Several principles affected the design of the system developed at TIGR. Each of the sample preparation, sequencing, and analysis steps had to be managed, scheduled, and tracked. This information had to be made readily available to those who needed it for carrying out their tasks. Different skill levels of the users had to be taken into account. The degree of human intervention at each step had to be evaluated and built into the design. A mixed processing environment of Macintosh and Unix platforms had to be integrated. Most importantly, the system had to save time, reduce error, and ensure uniformity of the analysis and quality of the results. In the authors´ experience, the tools they have built work well because of their early decisions as to which systems to use for development. The authors settled on a robust relational database management system (Sybase) and a portable development environment (C, C++)
  • Keywords
    DNA; biology computing; genetics; laboratory techniques; relational databases; Macintosh; Sybase; Unix; analysis infrastructure; genome research; high-throughput DNA sequencing projects; informatics; laboratory data analysis; mixed processing environment; molecular biology; portable development environment; relational database management system; sample preparation; Bioinformatics; Biology computing; DNA; Data analysis; Genomics; Humans; Informatics; Laboratories; Large-scale systems; Programming;
  • fLanguage
    English
  • Journal_Title
    Engineering in Medicine and Biology Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    0739-5175
  • Type

    jour

  • DOI
    10.1109/51.473264
  • Filename
    473264