Title :
Data Decomposition in Biomedical e-Science Applications
Author :
Mohammed, Yassene ; Shahand, Shayan ; Korkhov, Vladimir ; Luyf, Angela C M ; Van Schaik, Barbera D C ; Caan, Matthan W A ; Van Kampen, Antoine H C ; Palmblad, Magnus ; Olabarriaga, Silvia D.
Author_Institution :
Dept. of Clinical Epidemiology, Biostat., & Bioinf., Univ. of Amsterdam, Amsterdam, Netherlands
Abstract :
As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer´s experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.
Keywords :
bioinformatics; data mining; grid computing; information retrieval; medical information systems; parallel programming; DNA sequencing; Dutch e-Science infrastructure; automatic data decomposition methods; biomedical e-science applications; biomedical packages; data access; data intensive science; data mining decomposition patterns; grid computing environment; medical imaging; object set decomposition; parallel programming; Biomedical imaging; Data mining; Diffusion tensor imaging; Educational institutions; Parallel processing; Pipelines; Healthgrid; data decomposition; e-infrastructure; grid Computing; legacy application; porting to grid; workflows;
Conference_Titel :
e-Science Workshops (eScienceW), 2011 IEEE Seventh International Conference on
Conference_Location :
Stockholm
Print_ISBN :
978-1-4673-0026-1
DOI :
10.1109/eScienceW.2011.7