مرکز منطقه ای اطلاع رساني علوم و فناوري - Selective tree growing: a deterministic constant-space linear-time algorithm for pattern discovery and for computing multiple sequence alignment

DocumentCode :

2341858

Title :

Selective tree growing: a deterministic constant-space linear-time algorithm for pattern discovery and for computing multiple sequence alignment

Author :

Sambasivam, Mashilamani

fYear :

2002

fDate :

2002

Firstpage :

344

Abstract :

Summary form only given. Given a set of n sequences, the multiple sequence alignment problem is to align these n sequences, with gaps or otherwise, such that the commonality of the sequences is projected appropriately. If m is the total sum of the lengths of the input sequences, A is the alphabet size of the input sequences, and P is the final number of unique patterns, fixed by the user, that cause an alignment between sequences, then the algorithm runs in time bound O(m(A + P)), linear worst case time. Our algorithm runs on both sequences where A is small and large. Our algorithm forms the alignment by first discovering patterns, and thus is also a pattern discovery solution. We support our theoretical conclusions with experimental results obtained from running our algorithm on GenPept sequences and human genome sequences from the GenBank public domain database. Our algorithm uses direct n-wise alignment and constant memory space irrespective of the value of m. What differentiates this algorithm from most others is that it is deterministic; it is guaranteed and theoretically proved that all patterns of any arbitrary length that occur in at least k sequences and that are responsible for multiple sequence alignment are found by the algorithm, where k is specified by the user.

Keywords :

biology computing; computational complexity; deterministic algorithms; genetics; pattern recognition; sequences; trees (mathematics); GenBank public domain database; GenPept sequences; alphabet size; constant memory space; deterministic constant-space linear-time algorithm; direct n-wise alignment; human genome sequences; input sequences; linear worst case time; multiple sequence alignment; pattern discovery; selective tree growing; time bound; unique patterns; Bioinformatics; Clocks; Computer Society; DNA; Databases; Genomics; Humans; Linux; Pattern matching; Sequences;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society

Print_ISBN :

0-7695-1653-X

Type :

conf

DOI :

10.1109/CSB.2002.1039367

Filename :

1039367

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2341858