Title :
A sliding window and keyword tree based algorithm for multiple sequence alignment
Author :
Wang, Jun ; Sun, Yong
Author_Institution :
Sch. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
Abstract :
Multiple sequence alignment (MSA) is an important issue in genetic sequence analysis. The increasing volume of genome data requires tools that can quickly and accurately compare and align them. The most important step of MSA is the reference sequence determination. Current alignment methods usually need a huge time to find the reference sequence in long sequences and the accuracy of the determining sequence still need to improve. In this paper, a sliding window and the keyword tree based algorithm is employed to match the substring set of the sequence data and find the reference sequence with the greatest probability. The novel method can accurately find the center sequence and the complete matching regions. Using these regions, our algorithm can align the multiple sequences based on an improved center star method. Following the change of the advanced step value of the slide window, both the running time and the accuracy of our aligning method will change. Experimental results indicate that the improved method is faster and more accurate than others.
Keywords :
bioinformatics; genetics; genomics; probability; string matching; text analysis; MSA; aligning method; bioinformatics; center sequence; center star method; complete matching region; genetic sequence analysis; genome data; keyword tree based algorithm; multiple sequence alignment; probability; reference sequence determination; sequence data; sliding window; substring set matching; Accuracy; Bioinformatics; Complexity theory; Educational institutions; Genomics; Heuristic algorithms; keyword tree; multiple sequence alignment; reference sequence; sliding windows;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
DOI :
10.1109/FSKD.2012.6233880