• DocumentCode
    1990585
  • Title

    Mining Frequent Contiguous Sequence Patterns in Biological Sequences

  • Author

    Kang, Tae Ho ; Yoo, Jae Soo ; Kim, Hak Yong

  • Author_Institution
    Chungbuk Nat. Univ., Cheongju
  • fYear
    2007
  • fDate
    14-17 Oct. 2007
  • Firstpage
    723
  • Lastpage
    728
  • Abstract
    Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis (BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.
  • Keywords
    DNA; biology computing; data mining; molecular biophysics; proteins; sequences; DNA; MacosVSpan algorithm; amino acid sequences; biological sequences; mining; prefixSpan algorithm; sequence patterns; Amino acids; Biochemistry; Bioinformatics; Biology computing; DNA computing; Databases; Genetics; Information analysis; Pattern analysis; Sequences; Bioinformatics; biological Sequence Analysis; sequencel pattern mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
  • Conference_Location
    Boston, MA
  • Print_ISBN
    978-1-4244-1509-0
  • Type

    conf

  • DOI
    10.1109/BIBE.2007.4375640
  • Filename
    4375640