• DocumentCode
    1791565
  • Title

    Provenance-based object storage prediction scheme for scientific big data applications

  • Author

    Dong Dai ; Yong Chen ; Kimpe, Dries ; Ross, Robert

  • Author_Institution
    Comput. Sci. Dept., Texas Tech Univ., Lubbock, TX, USA
  • fYear
    2014
  • fDate
    27-30 Oct. 2014
  • Firstpage
    271
  • Lastpage
    280
  • Abstract
    Object storage has been increasingly adopted in high-performance computing for scientific, big data applications. With object storage, applications usually use object IDs, queries, or collections to identify the data instead of using files. Since the object store changes the way data is accessed in applications, it introduces new challenges for I/O prediction, which used to work based on interfile or intrafile pattern detection. The key challenge is that the inputs of object-based applications are no longer expressed as static file names: they become much more dynamic and unstable, hidden inside application logic. Traditional prediction strategies do not work well in such conditions. In this paper, we introduce the use of provenance information, which was collected for data management in high-performance computing systems, in order to build an accurate coarse-grained (object-level) input prediction. The prediction results can be preloaded into a burst buffer to accelerate future reads. To our best knowledge, this study is the first to use provenance information in object stores to predict application inputs. Evaluation results confirm the effectiveness and accuracy of our provenance-based prediction and show that the proposed prediction system is feasible for real-work deployment.
  • Keywords
    Big Data; parallel processing; scientific information systems; storage management; I/O prediction; coarse-grained input prediction; data management; high-performance computing system; interfile pattern direction; intrafile pattern detection; object-level input prediction; prediction strategy; prediction system; provenance information; provenance-based object storage prediction scheme; provenance-based prediction; real-work deployment; scientific big data application; static file name; Algorithm design and analysis; Big data; Complexity theory; Hidden Markov models; History; Prediction algorithms; Semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2014 IEEE International Conference on
  • Conference_Location
    Washington, DC
  • Type

    conf

  • DOI
    10.1109/BigData.2014.7004242
  • Filename
    7004242