DocumentCode :
1913942
Title :
A Hadoop-based Massive Molecular Data Storage Solution for Virtual Screening
Author :
Zhang, Yan ; Zhang, Ruisheng ; Chen, Qiuqiang ; Gao, Xiaopan ; Hu, Rongjing ; Zhang, Ying ; Liu, Guangcai
Author_Institution :
Eng. Res. Center of Open Source Software & Real-time Syst., Lanzhou Univ., Lanzhou, China
fYear :
2012
fDate :
20-23 Sept. 2012
Firstpage :
142
Lastpage :
147
Abstract :
Virtual Screening involves massive computing tasks with millions of molecules docking on the targeted protein. Such data-intensive science always faces the challenge of managing tens of TB datasets, which gives rise to the requirement of large-scale storage. Furthermore, the efficient query and transmission of the large-scale datasets are the other key requirements during the virtual screening progress. Therefore, in this data-intensive application, a massive data storage solution is expected to improve the efficiency of storage and access of large-scale molecules and their docking results, as well as facilitating the data preparing and analysis phases of virtual screening. In order to address the key requirements mentioned above, we proposed a novel storage solution based on Hadoop for virtual screening. HBase was implemented as a distributed database to persist the properties of massive molecules and docking results. HDFS was utilized as a molecule source files storage system. The comparison of the system performance was also presented. Finally, we concluded that the storage solution we proposed could be considered as an alternative attempt to enable the efficient storage and access of large-scale molecules and docking results in virtual screening research.
Keywords :
biology computing; data analysis; distributed databases; drugs; molecular biophysics; proteins; query processing; storage management; HBase; Hadoop-based massive molecular data storage solution; TB datasets; data analysis phase; data preparing phase; data-intensive application; distributed database; large-scale dataset query; large-scale dataset transmission; large-scale molecules; molecule source files storage system; protein; virtual screening; Chemicals; Distributed databases; Fault tolerance; Fault tolerant systems; Indexes; Memory; Cloud Computing; HBase; HDFS; Hadoop; Massive Data Storage; Virtual Screening;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2623-0
Electronic_ISBN :
978-0-7695-4816-6
Type :
conf
DOI :
10.1109/ChinaGrid.2012.26
Filename :
6337289
Link To Document :
بازگشت