DocumentCode
2191037
Title
Distributed, Scalable Clustering for Detecting Halos in Terascale Astronomy Datasets
Author
Daruru, Srivatsava ; Dhandapani, Sankari ; Gupta, Gunjan ; Iliev, Ilian ; Xu, Weijia ; Navratil, Paul ; Marín, Nena ; Ghosh, Joydeep
Author_Institution
Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
fYear
2010
fDate
13-13 Dec. 2010
Firstpage
138
Lastpage
147
Abstract
Terascale astronomical datasets have the potential to provide unprecedented insights into the origins of our universe. However, automated techniques for determining regions of interest are a must if domain experts are to cope with the intractable amounts of simulation data. This paper addresses the important problem of locating and tracking high density regions in space that generally correspond to halos and sub-halos and host galaxies. A density based, mode following clustering method called Automated Hierarchical Density Shaving (Auto-HDS) is adapted for this application. Auto-HDS can detect clusters of different densities while discarding the vast majority of background data. Two alternative parallel implementations of the algorithm, based respectively on the dataflow computational model and on Hadoop/ MapReduce functional programming constructs, are realized and compared. Based on runtime performance, scalability across compute cores and across increasing data volumes, we demonstrate the benefits of fine grain parallelism. The proposed distributed and multithreaded AutoHDS clustering algorithm is shown to produce high quality clusters, be computationally efficient, and scalable from 1 through 1024 compute-cores.
Keywords
astronomy computing; data flow computing; functional programming; multi-threading; pattern clustering; Hadoop; MapReduce; automated hierarchical density shaving; dataflow computational model; distributed AutoHDS clustering algorithm; fine grain parallelism; functional programming; halos detection; high density regions; host galaxies; multithreaded AutoHDS clustering algorithm; scalable clustering; terascale astronomy datasets; Astronomy; Distributed Clustering; Scalable; Terascale;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4244-9244-2
Electronic_ISBN
978-0-7695-4257-7
Type
conf
DOI
10.1109/ICDMW.2010.26
Filename
5693293
Link To Document