Title :
Accelerating motif finding problem using grid computing with enhanced Brute Force
Author_Institution :
Comput. Syst. Dept., Ain Shams Univ., Cairo, Egypt
Abstract :
Motif finding problem is a major task to understand the mechanisms of gene expression regulation. Motif is generally defined as a recurring pattern in the sequence of nucleotides or amino acids. In the DNA sequence, it is usually a short segment that occurs frequently, but not required to be an exact copy for each occurrence. This property of motif makes motif mining very difficult. In fact, motif finding problem is proven to be NP-Complete. Many algorithms fail to solve the well-known challenge problem of finding a motif of length 15 and has at most 4 mutations (15, 4) due to the huge runtime needed. Others fail to distinguish the motif from background sequences. Brute Force is one of the exact algorithms that never fail to find the motif, but it suffers from an intractable running time. In this paper, we present a novel approach to accelerate the motif finding problem using grid computing. As grid computing makes available low-cost high-performance computing (HPC) infrastructures, the best candidates for using such infrastructures are applications that require high computational power, large storage capacity, or fast and high-throughput networking. The nature of motif finding problem is considered one of the perfect scenarios for grid computing. We deployed an enhanced version of the Brute Force; skip-Brute Force running on EUMEDGRID infrastructure (Co-Funded project by the European Commission in the framework of FP6, with the aim of supporting the development of a Grid e-Infrastructure in the Mediterranean Area and promoting the porting of computationally intensive applications on the Grid platform). The idea behind our skip Brute Force algorithm is that it skips all the iterations that will not lead to a correct solution. The message passing programming paradigm is deployed since it assumes a partitioned address space and supports explicit parallelization. Our experimental results showed boosting in the performance without sacrificing the exactness of the Brute - - Force.
Keywords :
DNA; bioinformatics; computational complexity; grid computing; message passing; optimisation; DNA sequence; EUMEDGRID infrastructure; NP-complete problem; enhanced brute force; exact algorithms; gene expression regulation; grid computing; high-performance computing; message passing programming; motif finding problem; Acceleration; Amino acids; Computer applications; Computer networks; DNA; Gene expression; Genetic mutations; Grid computing; Runtime; Sequences; DNA sequence analysis; Grid Computing; Motif findin;
Conference_Titel :
Advanced Communication Technology (ICACT), 2010 The 12th International Conference on
Conference_Location :
Phoenix Park
Print_ISBN :
978-1-4244-5427-3