DocumentCode :
2049860
Title :
EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization
Author :
Tian, Yuan ; Klasky, Scott ; Abbasi, Hasan ; Lofstead, Jay ; Grout, Ray ; Podhorszki, Norbert ; Liu, Qing ; Wang, Yandong ; Yu, Weikuan
fYear :
2011
fDate :
26-30 Sept. 2011
Firstpage :
93
Lastpage :
102
Abstract :
Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.
Keywords :
Hilbert spaces; data handling; data mining; scientific information systems; EDO harmony; Hilbert space filling curve; checkpoint-restart data; data groups distribution; data organization strategies; elastic data organization; file system; flexible data ordering algorithm; multidimensional dataset; scientific application; Arrays; Bandwidth; Concurrent computing; Filling; Hilbert space; Laboratories; Organizations; ADIOS; Data Organization; Parallel I/O; Planar Read Patterns; Space Filling Curve;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
Type :
conf
DOI :
10.1109/CLUSTER.2011.18
Filename :
6061044
Link To Document :
بازگشت