DocumentCode :
1791831
Title :
Building Wrangler: A transformational data intensive resource for the open science community
Author :
Gaffney, Niall ; Jordan, Christopher ; Minyard, Tommy ; Stanzione, Dan
Author_Institution :
Texas Adv. Comput. Center, Univ. of Texas at Austin, Austin, TX, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
20
Lastpage :
22
Abstract :
With the growth of data in science and engineering fields and the I/O intense technologies used to carry out research with these massive datasets, it has become clear new solutions to support data research is required. In support of this, the Texas Advanced Computing Center presents Wrangler, the first open science research platform built from the ground up in support of data. Wrangler features a replicated 10 PB Lustre based parallel file system, compute capacity of 120 Intel Haswell nodes and 15 TB of RAM. In addition to the base system, Wrangler features a unique NAND flash-based storage system from DSSD, providing users with 0.5 PB of storage 1 TB/s bandwidth and 250 million IOP/s across the cluster. Supporting Hadoop, but not just Hadoop, Wrangler will provide current and future researchers with an environment supporting the most I/O intensive workflows in fields from astronomy to paleontology. With data at the forefront of Wrangler´s mission, support for ETL workflows, data curation, and data publication will enable users as they both discover new results and publish their own research. Support for both SQL and noSQL databases and GIS based extensions will also be provided, allowing users to leverage these tools for both data cataloging and cross-study integration. Wrangler will allow users to focus more on what is most important to them, the data and knowledge gained from its analysis, and less on the details of curation and I/O optimization.
Keywords :
NAND circuits; SQL; data handling; file organisation; flash memories; parallel processing; random-access storage; DSSD; ETL workflow; GIS based extension; Hadoop; I/O intense technology; I/O intensive workflow; I/O optimization; Intel Haswell nodes; NAND flash-based storage system; PB Lustre based parallel file system; RAM; Texas Advanced Computing Center; Wrangler; cross-study integration; data cataloging; data curation; data publication; data research; noSQL database; open science community; open science research platform; transformational data intensive resource; Bandwidth; Communities; Decision support systems; Distributed databases; File systems; Servers; Data Analysis; Data Systems; Data storage systems; Data transfer; Database machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004480
Filename :
7004480
Link To Document :
بازگشت