• DocumentCode
    471844
  • Title

    Using Suffix Tree to Discover Complex Repetitive Patterns in DNA Sequences

  • Author

    He, Dan

  • Author_Institution
    Dept. of Comput. Sci., Vermont Univ., Burlington, VT
  • fYear
    2006
  • fDate
    Aug. 30 2006-Sept. 3 2006
  • Firstpage
    3474
  • Lastpage
    3477
  • Abstract
    The discovery of repetitive patterns is a fundamental problem in bioinformatics. It remains a challenging open problem because most of the existing methods, such as using annotated repeat database and extracting pairs of maximum repeated regions, can not give a correct definition incorporating both the length and frequency factors of the repetitive patterns. There is an algorithm considering both the pattern length and frequency. However, it could only find the simple "elementary" repeats and is not able to reveal the complex structure of the repetitive patterns. Furthermore, its time complexity O(n2f), where n is the length of the sequence, f is the minimum frequency requirement, could be still too high for long DNA sequences. In this paper, we propose a novel algorithm using suffix tree to reveal the complex structure of the repetitive patterns in DNA sequences. We show that our algorithm achieves an O(n2f2 ) time complexity
  • Keywords
    DNA; biology computing; computational complexity; molecular biophysics; trees (mathematics); DNA sequences; annotated repeat database; bioinformatics; complex repetitive patterns; pattern frequency; pattern length; suffix tree; time complexity; Bioinformatics; DNA; Databases; Diseases; Frequency; Genetics; Genomics; Humans; Libraries; Sequences; complex structure; elementary repeats; repetitive patterns; suffix tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE
  • Conference_Location
    New York, NY
  • ISSN
    1557-170X
  • Print_ISBN
    1-4244-0032-5
  • Electronic_ISBN
    1557-170X
  • Type

    conf

  • DOI
    10.1109/IEMBS.2006.260445
  • Filename
    4462544