• DocumentCode
    1300513
  • Title

    Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences

  • Author

    Biing-Feng Wang

  • Author_Institution
    Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan
  • Volume
    9
  • Issue
    2
  • fYear
    2012
  • Firstpage
    548
  • Lastpage
    559
  • Abstract
    The focus of this paper is the problem of finding all nested common intervals of two general sequences. Depending on the treatment one wants to apply to duplicate genes, Blin et al. introduced three models to define nested common intervals of two sequences: the uniqueness, the free-inclusion, and the bijection models. We consider all the three models. For the uniqueness and the bijection models, we give O(n + Nout)-time algorithms, where Nout denotes the size of the output. For the free-inclusion model, we give an O(n1+ε + Nout)-time algorithm, where ε >; 0 is an arbitrarily small constant. We also present an upper bound on the size of the output for each model. For the uniqueness and the free-inclusion models, we show that Nout = O(n2). Let C = ΣgϵΓ o1(g)o2(5), where Γ is the set of distinct genes, and o1(g) and o2(g) are, respectively, the numbers of copies of g in the two given sequences. For the bijection model, we show that Nout = O(Cn). In this paper, we also study the problem of finding all approximate nested common intervals of two sequences on the bijection model. An O(δn + Nout)-time algorithm is presented, where δ denotes the maximum number of allowed gaps. In addition, we show that for this problem Nout is O(δn3).
  • Keywords
    biology computing; data structures; genetics; genomics; bijection models; conserved gene clusters; data structures; duplicate genes; free-inclusion model; general sequences; genomics; nested common intervals; output-sensitive algorithms; time algorithms; Algorithm design and analysis; Approximation algorithms; Bioinformatics; Biological system modeling; Computational biology; Computational modeling; Genomics; Algorithms; common intervals; comparative genomics; conserved gene clusters.; data structures; Algorithms; Genomics; Multigene Family; Sequence Analysis, DNA;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2011.112
  • Filename
    5989789