• DocumentCode
    1605187
  • Title

    Gene Sequences Clustering and Identifying Functional Domain Using a Suffix Tree Algorithm

  • Author

    Sang il Han ; Lee, Sung Gun ; Hwang, Kyu Suk ; Kim, Young Han

  • Author_Institution
    Dept. of Chem. Eng., Busan Nat. Univ.
  • fYear
    2006
  • Firstpage
    4672
  • Lastpage
    4675
  • Abstract
    Most multiple gene sequence alignment methods rely on conventions regarding the score of a multiple alignment by pairwise alignment. Therefore, as the number of sequences increases, the runtime of sequencing expands exponentially In order to solve the problem, this paper presents a multiple sequence alignment method using a linear-time suffix tree algorithm to cluster similar sequences at one time without pairwise alignment. After searching for common subsequences, cross-matching common subsequences were generated, and sometimes inexact matching was found. So, a procedure aimed at masking the inexact cross-matching pairs was suggested here. In addition, BLAST was combined with a clustering tool in order to annotate the clusters generated by suffix tree clustering. The performance of the proposed system, CLAGen, was successfully evaluated with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria, identifying 11 clusters
  • Keywords
    biology computing; database management systems; genetics; pattern clustering; pattern matching; sequences; trees (mathematics); BLAST; TCA cycle; citrate cycle; cross-matching common subsequences; functional domain identifivation; gene sequences clustering; linear-time suffix tree algorithm; multiple sequence alignment method; pairwise alignment; Bioinformatics; Chemical engineering; Clustering algorithms; Clustering methods; DNA; Genomics; Microorganisms; Proteins; Runtime; Sequences; BLAST; CLAGen; Clustering; Gene sequence; Multiple sequence alignment; TCA cycle;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    SICE-ICASE, 2006. International Joint Conference
  • Conference_Location
    Busan
  • Print_ISBN
    89-950038-4-7
  • Electronic_ISBN
    89-950038-5-5
  • Type

    conf

  • DOI
    10.1109/SICE.2006.314699
  • Filename
    4108503