Title :
Provenance-based object storage prediction scheme for scientific big data applications
Author :
Dong Dai ; Yong Chen ; Kimpe, Dries ; Ross, Robert
Author_Institution :
Comput. Sci. Dept., Texas Tech Univ., Lubbock, TX, USA
Abstract :
Object storage has been increasingly adopted in high-performance computing for scientific, big data applications. With object storage, applications usually use object IDs, queries, or collections to identify the data instead of using files. Since the object store changes the way data is accessed in applications, it introduces new challenges for I/O prediction, which used to work based on interfile or intrafile pattern detection. The key challenge is that the inputs of object-based applications are no longer expressed as static file names: they become much more dynamic and unstable, hidden inside application logic. Traditional prediction strategies do not work well in such conditions. In this paper, we introduce the use of provenance information, which was collected for data management in high-performance computing systems, in order to build an accurate coarse-grained (object-level) input prediction. The prediction results can be preloaded into a burst buffer to accelerate future reads. To our best knowledge, this study is the first to use provenance information in object stores to predict application inputs. Evaluation results confirm the effectiveness and accuracy of our provenance-based prediction and show that the proposed prediction system is feasible for real-work deployment.
Keywords :
Big Data; parallel processing; scientific information systems; storage management; I/O prediction; coarse-grained input prediction; data management; high-performance computing system; interfile pattern direction; intrafile pattern detection; object-level input prediction; prediction strategy; prediction system; provenance information; provenance-based object storage prediction scheme; provenance-based prediction; real-work deployment; scientific big data application; static file name; Algorithm design and analysis; Big data; Complexity theory; Hidden Markov models; History; Prediction algorithms; Semantics;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004242