• DocumentCode
    3717364
  • Title

    Is Apache Spark scalable to seismic data analytics and computations?

  • Author

    Yuzhong Yan;Lei Huang;Liqi Yi

  • Author_Institution
    Department of Computer Science, Prairie View A&M University, Prairie View, TX
  • fYear
    2015
  • Firstpage
    2036
  • Lastpage
    2045
  • Abstract
    High Performance Computing (HPC) has been a dominated technology used in seismic data processing at the petroleum industry. However, with the increasing data size and varieties, traditional HPC focusing on computation meets new challenges. Researchers are looking for new computing platforms with a balance of both performance and productivity, as well as featured with big data analytics capability. Apache Spark is a new big data analytics platform that supports more than map/reduce parallel execution mode with good scalability and fault tolerance. In this paper, we try to answer the question that if Apache Spark is scalable to process seismic data with its in-memory computation and data locality features. We use a few typical seismic data processing algorithms to study the performance and productivity. Our contributions include customized seismic data distributions in Spark, extraction of commonly used templates for seismic data processing algorithms, and performance analysis of several typical seismic processing algorithms.
  • Keywords
    "Big data","Industries","Sparks","Fault tolerance","Fault tolerant systems","Computational modeling"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363985
  • Filename
    7363985