• DocumentCode
    2321171
  • Title

    SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

  • Author

    Wang, Yi ; Jiang, Wei ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2012
  • fDate
    13-16 May 2012
  • Firstpage
    443
  • Lastpage
    450
  • Abstract
    Despite the popularity of MapReduce, there are several obstacles to applying it for developing scientific data analysis applications. Current MapReduce implementations require that data be loaded into specialized file systems, like the Hadoop Distributed File System (HDFS), whereas with rapidly growing size of scientific datasets, reloading data in another file system or format is not feasible. We present a framework that allows scientific data in different formats to be processed with a MapReduce like API. Our system is referred to as SciMATE, and is based on the MATE system developed at Ohio State. SciMATE is developed as a customizable system, which can be adapted to support processing on any of the scientific data formats. We have demonstrated the functionality of our system by creating instances that can be processing NetCDF and HDF5 formats as well as flat-files. We have also implemented three popular data mining applications and have evaluated their execution with each of the three instances of our system.
  • Keywords
    data analysis; data mining; distributed databases; network operating systems; scientific information systems; API; HDF5 format; Hadoop distributed file system; MATE system; MapReduce; NetCDF format; SciMATE; customizable system; data mining; data reloading; scientific data analysis; scientific data format; scientific datasets; specialized file systems; Data analysis; Data models; Libraries; Loading; Optimization; Principal component analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
  • Conference_Location
    Ottawa, ON
  • Print_ISBN
    978-1-4673-1395-7
  • Type

    conf

  • DOI
    10.1109/CCGrid.2012.32
  • Filename
    6217452