DocumentCode :
2837613
Title :
Biodoop: Bioinformatics on Hadoop
Author :
Leo, Simone ; Santoni, Federico ; Zanetti, Gianluigi
Author_Institution :
CRS4, Pula, Italy
fYear :
2009
fDate :
22-25 Sept. 2009
Firstpage :
415
Lastpage :
422
Abstract :
Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets, while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.
Keywords :
bioinformatics; parallel processing; public domain software; BLAST; Biodoop; GRAMMAR; GSEA; Hadoop bioinformatics; MapReduce; computational efficiency; data processing; data sets; low weight computation; open source implementation; parallel computing; scalability; Algorithm design and analysis; Bioinformatics; Computational efficiency; Concurrent computing; Data analysis; File systems; Libraries; Master-slave; Parallel processing; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops, 2009. ICPPW '09. International Conference on
Conference_Location :
Vienna
ISSN :
1530-2016
Print_ISBN :
978-1-4244-4923-1
Electronic_ISBN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2009.37
Filename :
5364545
Link To Document :
بازگشت