Title :
NOHAA: A NOvel Framework for HPC Analytics over Windows Azure
Author :
Qiangju Xiao ; Jun Wang ; Yan Ma ; Lizhe Wang
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
Abstract :
HPC analytics has become increasingly vital to analyze the large volumes of data produced by sophisticated computing instruments. Meanwhile, with the successful development of cloud computing, more and more scientists are devoted to deploy HPC analytics in the ever-popular clouds, which poses new challenges mainly caused by different storage architectures, resource management mechanisms and programming APIs. Firstly, there exists a ``data semantics" gap between the way data are stored by Cloud platform and the way data will be accessed by the HPC Analytics. Secondly, data are mostly distributed across data nodes for in-house data-intensive clusters to achieve co-located computation and storage, however, it is challenging for the public clouds to mimic because their data are stored centrally. In this paper, we develop a new HPC analytics framework called NOHAA, to provide 1) a semantics-aware intelligent data upload interface and 2) a locality-aware hierarchical storage system in support of co-located computation and storage on Windows Azure. Our extensive real world experiments show that NOHAA significantly reduces the average data access time by up to 85% and accelerates the HPC analytics execution time by a factor of 2 to 7.
Keywords :
application program interfaces; cloud computing; memory architecture; mobile computing; parallel processing; HPC analytic execution time; NOHAA; Windows Azure; cloud computing; cloud platform; colocated computation; colocated storage; data nodes; data semantic gap; in-house data-intensive clusters; locality-aware hierarchical storage system; programming API; resource management mechanisms; semantic-aware intelligent data upload interface; storage architectures; Cloud computing; Computational modeling; Data models; Data preprocessing; Data transfer; Distributed databases; Semantics; Azure; Co-located Computation and Storage; Data-intensive; HDFS; HPC analytics; Hadoop; MapReduce;
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4673-4565-1
Electronic_ISBN :
1521-9097
DOI :
10.1109/ICPADS.2012.68