• DocumentCode
    2191037
  • Title

    Distributed, Scalable Clustering for Detecting Halos in Terascale Astronomy Datasets

  • Author

    Daruru, Srivatsava ; Dhandapani, Sankari ; Gupta, Gunjan ; Iliev, Ilian ; Xu, Weijia ; Navratil, Paul ; Marín, Nena ; Ghosh, Joydeep

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
  • fYear
    2010
  • fDate
    13-13 Dec. 2010
  • Firstpage
    138
  • Lastpage
    147
  • Abstract
    Terascale astronomical datasets have the potential to provide unprecedented insights into the origins of our universe. However, automated techniques for determining regions of interest are a must if domain experts are to cope with the intractable amounts of simulation data. This paper addresses the important problem of locating and tracking high density regions in space that generally correspond to halos and sub-halos and host galaxies. A density based, mode following clustering method called Automated Hierarchical Density Shaving (Auto-HDS) is adapted for this application. Auto-HDS can detect clusters of different densities while discarding the vast majority of background data. Two alternative parallel implementations of the algorithm, based respectively on the dataflow computational model and on Hadoop/ MapReduce functional programming constructs, are realized and compared. Based on runtime performance, scalability across compute cores and across increasing data volumes, we demonstrate the benefits of fine grain parallelism. The proposed distributed and multithreaded AutoHDS clustering algorithm is shown to produce high quality clusters, be computationally efficient, and scalable from 1 through 1024 compute-cores.
  • Keywords
    astronomy computing; data flow computing; functional programming; multi-threading; pattern clustering; Hadoop; MapReduce; automated hierarchical density shaving; dataflow computational model; distributed AutoHDS clustering algorithm; fine grain parallelism; functional programming; halos detection; high density regions; host galaxies; multithreaded AutoHDS clustering algorithm; scalable clustering; terascale astronomy datasets; Astronomy; Distributed Clustering; Scalable; Terascale;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-9244-2
  • Electronic_ISBN
    978-0-7695-4257-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2010.26
  • Filename
    5693293