DocumentCode :
659411
Title :
Locality-driven high-level I/O aggregation for processing scientific datasets
Author :
Jialin Liu ; Crysler, Bradly ; Yin Lu ; Yong Chen
Author_Institution :
Dept. of Comput. Sci., Texas Tech Univ., Lubbock, TX, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
103
Lastpage :
111
Abstract :
Scientific I/O libraries, like PnetCDF, ADIOS, and HDF5, have been commonly used to facilitate the array-based scientific dataset processing. The underlying physical data layout information, however, is usually hidden from the upper layer´s logical access. Such mismatching can lead to poor I/O. In this research, we have observed performance degradation in the case of concurrent sub-array accesses, where overlaps among calls that access sub-arrays led to high contention on storage servers due to the logical-physical mismatching. We propose a locality-driven high-level I/O aggregation approach to address these issues in this work. By designing a logical-physical mapping scheme, we try to utilize the scientific dataset´s structured formats and the file systems´ data distribution to resolve the mismatching issue. Therefore the I/O can be carried out in a locality-driven fashion. The proposed approach is effective and complements the existing I/O strategies, such as the independent I/O and collective I/O strategy. We have also carried out experimental tests and the results confirm the performance improvement compared to existing I/O strategies. The proposed locality-driven highlevel I/O aggregation approach holds a promise for efficiently processing scientific datasets, which is critical for the data intensive or big data computing era.
Keywords :
Big Data; data structures; natural sciences computing; storage management; ADIOS; HDF5; PnetCDF; array-based scientific dataset processing; big data computing; call overlap; collective I/O strategy; concurrent subarray access; data intensive computing; file system data distribution; independent I/O strategy; locality-driven high-level I/O aggregation approach; logical-physical mapping scheme; logical-physical mismatching; performance degradation; physical data layout information; scientific I/O libraries; scientific dataset structured format; storage server; upper layer logical access; Arrays; Correlation; Degradation; Information management; Layout; Libraries; Optimization; Big data; I/O aggregation; collective I/O; data intensive computing; high performance computing; scientific I/O library;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691560
Filename :
6691560
Link To Document :
بازگشت