• DocumentCode
    2357466
  • Title

    Parallel and Distributed Astronomical Data Analysis on Grid Datafarm

  • Author

    Yamamoto, Naotaka ; Tatebe, Osamu ; Sekiguchi, Satoshi

  • Author_Institution
    Grid Technol. Res. Center, AIST, Ibaraki
  • fYear
    2004
  • fDate
    8-8 Nov. 2004
  • Firstpage
    461
  • Lastpage
    466
  • Abstract
    A comprehensive study of the whole petabyte-scale archival data of astronomical observatories has a possibility of new science and new knowledge in the field, while it was not feasible so far due to lack of enough data analysis environment. The Grid Datafarm architecture is designed for global petabyte-scale data-intensive computing, which provides a Grid file system with file replica management for fault tolerance and load balancing, and parallel and distributed data computing support for a set of files, to meet with the requirements of the comprehensive study of the whole archival data. In the paper, we discuss about worldwide parallel and distributed data analysis in the observational astronomical field. The archival data is stored, replicated and dispersed in a Gfarm file system. All the astronomical data analysis tools successfully access files in Gfarm file system without any code modification, using a syscall hooking library regardless of file replica locations. Performance evaluation of the parallel data analysis in several ways shows file-affinity process scheduling plays an essential role for scalable and efficient parallel file I/O performance. A data calibration tools shows scalable file I/O performance, and achieved the file I/O performance of 5.9 GB/sec and 4.0 GB/sec for reading and writing FITS files, respectively, using 30 cluster nodes (60 CPUs). On-demandfile replica creation mitigates the overhead of access concentration. Another tool shows the performance improvement at a factor of six for reading a shared file by creating file replicas
  • Keywords
    astronomical observatories; astronomy computing; data analysis; grid computing; parallel processing; Gfarm file system; Grid Datafarm architecture; distributed data computing; fault tolerance; file replica management; file-affinity process scheduling; load balancing; observational astronomical field; parallel data analysis; petabyte-scale archival data; Computer architecture; Concurrent computing; Data analysis; Distributed computing; Fault tolerant systems; File systems; Grid computing; Libraries; Load management; Observatories;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on
  • Conference_Location
    Pittsburgh, PA
  • ISSN
    1550-5510
  • Print_ISBN
    0-7695-2256-4
  • Type

    conf

  • DOI
    10.1109/GRID.2004.47
  • Filename
    1382867