Title :
Efficient and Scalable Motif Discovery using Graph-based Search
Author :
Sinha, Amit U. ; Bhatnagar, Raj
Author_Institution :
Dept. of ECECS, Cincinnati Univ., OH
Abstract :
Identification of short repeated patterns (motifs) in genomic sequences is the key to many problems in bioinformatics. The promoter regions of genes are an important target of search for such motifs (transcription factor binding sites). We present a new algorithm, Mortice, for detecting potential binding sites which are present in a given set of genomic sequences. An informed search is performed by organizing the input patterns and their variants in a graph. Such a strategy efficiently leads to the desired solutions. The background is modeled as a Markov process and a composite score function is used. We demonstrate the performance of our algorithm by testing it on real-life data sets from yeast and human promoter sequences. We compared the performance with several popular algorithms and found that other algorithms work well with lower organisms like yeast but only a couple of them work well with human data. We show that our algorithm scales linearly with the size of input dataset. We compare the computational efficiency of our algorithm with other algorithms and show that it performs faster for different datasets and motif sizes
Keywords :
biology computing; data mining; genetics; graph theory; search problems; Mortice algorithm; binding sites; bioinformatics; efficient motif discovery; genomic sequences; graph based search; scalable motif discovery; short repeated patterns; Bioinformatics; Computational biology; Computational intelligence; DNA; Fungi; Genomics; Humans; Libraries; Proteins; Sequences;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0710-9
DOI :
10.1109/CIBCB.2007.4221224