Title :
Gene Sequences Clustering and Identifying Functional Domain Using a Suffix Tree Algorithm
Author :
Sang il Han ; Lee, Sung Gun ; Hwang, Kyu Suk ; Kim, Young Han
Author_Institution :
Dept. of Chem. Eng., Busan Nat. Univ.
Abstract :
Most multiple gene sequence alignment methods rely on conventions regarding the score of a multiple alignment by pairwise alignment. Therefore, as the number of sequences increases, the runtime of sequencing expands exponentially In order to solve the problem, this paper presents a multiple sequence alignment method using a linear-time suffix tree algorithm to cluster similar sequences at one time without pairwise alignment. After searching for common subsequences, cross-matching common subsequences were generated, and sometimes inexact matching was found. So, a procedure aimed at masking the inexact cross-matching pairs was suggested here. In addition, BLAST was combined with a clustering tool in order to annotate the clusters generated by suffix tree clustering. The performance of the proposed system, CLAGen, was successfully evaluated with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria, identifying 11 clusters
Keywords :
biology computing; database management systems; genetics; pattern clustering; pattern matching; sequences; trees (mathematics); BLAST; TCA cycle; citrate cycle; cross-matching common subsequences; functional domain identifivation; gene sequences clustering; linear-time suffix tree algorithm; multiple sequence alignment method; pairwise alignment; Bioinformatics; Chemical engineering; Clustering algorithms; Clustering methods; DNA; Genomics; Microorganisms; Proteins; Runtime; Sequences; BLAST; CLAGen; Clustering; Gene sequence; Multiple sequence alignment; TCA cycle;
Conference_Titel :
SICE-ICASE, 2006. International Joint Conference
Conference_Location :
Busan
Print_ISBN :
89-950038-4-7
Electronic_ISBN :
89-950038-5-5
DOI :
10.1109/SICE.2006.314699