• DocumentCode
    1473663
  • Title

    Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format

  • Author

    Bustamam, Alhadi ; Burrage, Kevin ; Hamilton, Nicholas A.

  • Author_Institution
    Dept. of Math., Univ. of Indonesia, Depok, Indonesia
  • Volume
    9
  • Issue
    3
  • fYear
    2012
  • Firstpage
    679
  • Lastpage
    692
  • Abstract
    Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However, with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.
  • Keywords
    Markov processes; bioinformatics; graphics processing units; large-scale systems; parallel architectures; CUDA tool; ELLPACK-R sparse format; GPU computing; bioinformatics; biological networks; critical limiting factor; fast Markov clustering algorithm; fast parallel Markov clustering; fine-grain massively parallel processing; interaction networks data sets; large-scale parallel computation; massively parallel computing environment; off-the-shelf desktop-machines; on-chip memory; parallel sparse Markov matrix normalizations; parallel sparse matrix-matrix computations; supercomputing architectures; Bioinformatics; Graphics processing unit; Instruction sets; Markov processes; Multicore processing; Parallel processing; Proteins; CUDA; ELLPACK-R sparse format; GPU computing; Markov clustering; PPI networks; bioinformatics.; graphs and networks; parallelism and concurrency; performance evaluation; scalable parallel programming; Cluster Analysis; Computational Biology; Computer Graphics; Computer Simulation; Markov Chains; Oligonucleotide Array Sequence Analysis;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2011.68
  • Filename
    6171972