• DocumentCode
    1497747
  • Title

    Scaffold Filling under the Breakpoint and Related Distances

  • Author

    Jiang, Haitao ; Zheng, Chunfang ; Sankoff, David ; Zhu, Binhai

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
  • Volume
    9
  • Issue
    4
  • fYear
    2012
  • Firstpage
    1220
  • Lastpage
    1229
  • Abstract
    Motivated by the trend of genome sequencing without completing the sequence of the whole genomes, a problem on filling an incomplete multichromosomal genome (or scaffold) I with respect to a complete target genome G was studied. The objective is to minimize the resulting genomic distance between I´ and G, where I´ is the corresponding filled scaffold. We call this problem the one-sided scaffold filling problem. In this paper, we conduct a systematic study for the scaffold filling problem under the breakpoint distance and its variants, for both unichromosomal and multichromosomal genomes (with and without gene repetitions). When the input genome contains no gene repetition (i.e., is a fragment of a permutation), we show that the two-sided scaffold filling problem (i.e., G is also incomplete) is polynomially solvable for unichromosomal genomes under the breakpoint distance and for multichromosomal genomes under the genomic (or DCJ-Double-Cut-and-Join) distance. However, when the input genome contains some repeated genes, even the one-sided scaffold filling problem becomes NP-complete when the similarity measure is the maximum number of adjacencies between two sequences. For this problem, we also present efficient constant-factor approximation algorithms: factor-2 for the general case and factor 1.33 for the one-sided case.
  • Keywords
    biology computing; cellular biophysics; computational complexity; genomics; NP-complete problem; breakpoint distance; constant-factor approximation algorithms; genome sequencing; genomic distance; incomplete multichromosomal genome I; one-sided scaffold filling problem; two-sided scaffold filling problem; unichromosomal genomes; Approximation algorithms; Approximation methods; Bioinformatics; Computational biology; Educational institutions; Genomics; Polynomials; Comparative genomics; DCJ; NP-completeness; algorithms.; breakpoint distance; genomic distance; scaffold filling; Algorithms; Genes; Genome; Genomics; Sequence Analysis, DNA;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2012.57
  • Filename
    6185533