DocumentCode :
2341858
Title :
Selective tree growing: a deterministic constant-space linear-time algorithm for pattern discovery and for computing multiple sequence alignment
Author :
Sambasivam, Mashilamani
fYear :
2002
fDate :
2002
Firstpage :
344
Abstract :
Summary form only given. Given a set of n sequences, the multiple sequence alignment problem is to align these n sequences, with gaps or otherwise, such that the commonality of the sequences is projected appropriately. If m is the total sum of the lengths of the input sequences, A is the alphabet size of the input sequences, and P is the final number of unique patterns, fixed by the user, that cause an alignment between sequences, then the algorithm runs in time bound O(m(A + P)), linear worst case time. Our algorithm runs on both sequences where A is small and large. Our algorithm forms the alignment by first discovering patterns, and thus is also a pattern discovery solution. We support our theoretical conclusions with experimental results obtained from running our algorithm on GenPept sequences and human genome sequences from the GenBank public domain database. Our algorithm uses direct n-wise alignment and constant memory space irrespective of the value of m. What differentiates this algorithm from most others is that it is deterministic; it is guaranteed and theoretically proved that all patterns of any arbitrary length that occur in at least k sequences and that are responsible for multiple sequence alignment are found by the algorithm, where k is specified by the user.
Keywords :
biology computing; computational complexity; deterministic algorithms; genetics; pattern recognition; sequences; trees (mathematics); GenBank public domain database; GenPept sequences; alphabet size; constant memory space; deterministic constant-space linear-time algorithm; direct n-wise alignment; human genome sequences; input sequences; linear worst case time; multiple sequence alignment; pattern discovery; selective tree growing; time bound; unique patterns; Bioinformatics; Clocks; Computer Society; DNA; Databases; Genomics; Humans; Linux; Pattern matching; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN :
0-7695-1653-X
Type :
conf
DOI :
10.1109/CSB.2002.1039367
Filename :
1039367
Link To Document :
بازگشت