• DocumentCode
    2938194
  • Title

    An Efficient Cross-Match Implementation Based on Directed Join Algorithm in MapReduce

  • Author

    Mi, Cuncang ; Qian Chen ; Taoying Liu

  • Author_Institution
    Inst. of Comput. Technol., Beijing, China
  • fYear
    2011
  • fDate
    5-8 Dec. 2011
  • Firstpage
    41
  • Lastpage
    48
  • Abstract
    In the field of astronomy, "Cross-Match" is a common operation used to mine useful information by joining different star catalogues. Nowadays star catalogues obtained through astronomical telescopes are becoming much larger than ever before, which drives us to consider implementing Cross-Match in a distributed computing environment. Although the computer hardware is cheap now and resizable compute capacity in the cloud is also available from some web services, we conduct experiments in a restricted environment to conserve resources as much as possible. In our work, we first use Hive from Face book, but find it not as efficient as we expected when facing two big catalogues. Then we analyze the join process Hive has and carry out some optimization, however, the result is still not satisfactory. Finally, we design our own Cross-Match program which bases on the directed join algorithm in MapReduce, takes advantage of the characteristics of astronomical data, and runs on top of Hadoop. Our program has improved the performance by 86% compared with the common join in Hive when making Cross-Match between USNOA and 2MASS.
  • Keywords
    Web services; astronomical catalogues; astronomy computing; cloud computing; data handling; data mining; 2MASS; Cross-Match program; Facebook; Hadoop; Hive; MapReduce; USNOA; Web service; astronomical data; astronomical telescope; astronomy; cloud computing; directed join algorithm; distributed computing environment; star catalogue; useful information mining; Astronomy; Computational modeling; Data processing; Distributed databases; Indexes; Telescopes; Big star catalogues; Cross-Match; Directed Join; Hive; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on
  • Conference_Location
    Victoria, NSW
  • Print_ISBN
    978-1-4577-2116-8
  • Type

    conf

  • DOI
    10.1109/UCC.2011.16
  • Filename
    6123479