Title :
Combining Hadoop and GPU to preprocess large Affymetrix microarray data
Author :
Sufeng Niu ; Guangyu Yang ; Sarma, Nilim ; Pengfei Xuan ; Smith, Malcolm C. ; Srimani, Pradip ; Feng Luo
Author_Institution :
Holcombe Dept. of Electr. & Comput. Eng., Clemson Univ., Clemson, SC, USA
Abstract :
High density oligonucleotide array (microarray) from Affymetrix has been widely used for the measurements of gene expressions. Currently, public data repositories, such as Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI), have accumulated large amounts of microarray data. Efficient integrative analysis of those microarray data will provide significant knowledge about biological systems. None of the existing microarray preprocessing and quality assessment tools can handle very large microarray datasets with tens of thousands of experiments. The preprocessing and quality assessment of microarray datasets contain both data-intensive and compute-intensive tasks. In this paper, we develop a new set of tools using a mix of the Hadoop (for data intensive tasks) and the General-Purpose Graphics Processing Units (GPGPUs) (for compute intensive tasks) to efficiently process large microarray data. Evaluation of our new tools on large microarray datasets with ten thousands of experiments showed promising superior performance. We demonstrate that the combination of Hadoop and GPGPU computation is effective for complex scientific applications that contain both data-intensive and compute-intensive tasks. Our new tool set will make it possible to utilize valuable large microarray data in the public repositories.
Keywords :
biology computing; data analysis; graphics processing units; lab-on-a-chip; Affymetrix microarray data preprocessing; GEO; GPGPUs; Hadoop; NCBI; National Center for Biotechnology Information; biological systems; compute intensive tasks; data intensive tasks; gene expression measurements; gene expression omnibus; general-purpose graphics processing units; high density oligonucleotide array; integrative microarray data analysis; Arrays; Computational modeling; Graphics processing units; Matrix decomposition; Probes; Quality assessment; Symmetric matrices; Map/Reduce; Microarray Preprocessing;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004293