DocumentCode :
3459485
Title :
The Use of Locality Information on Data Intensive Parallel File Systems
Author :
Ryoiti Sugawara Junior, Ricardo ; Matsumoto Sato, Liria
Author_Institution :
Dept. of Comput. & Digital Syst. Eng., Univ. of Sao Paulo, Sao Paulo, Brazil
fYear :
2013
fDate :
3-5 Dec. 2013
Firstpage :
167
Lastpage :
173
Abstract :
Many recent data intensive parallel systems builds with cost effective hardware and combine compute and storage facilities. Since bandwidth-bisecting networks are the norm, distributing jobs near data provides significant performance improvements. However, the data locality information is not easily available to the programmer. It requires interaction with file system internals, or the adoption of a custom programming and run-time frameworks that provide locality-aware job scheduling, such as Mapreduce and Hadoop. In this paper, we present a parallel file system implementation combined with virtual files presented as text that can be queried for locality data or written to control the placement of new data blocks. This simplifies how software that do not adhere to Mapreduce´s model can benefit from computing near the data. To evaluate the proposed approach, a number of tests were run on an initial implementation using fast disks, with locality-aware cases showing from 2 to 9 times faster reads and higher processor utilization.
Keywords :
data handling; file organisation; parallel processing; Hadoop; Mapreduce model; custom programming; data intensive parallel file systems; data locality information; run-time frameworks; storage facilities; virtual files; Bandwidth; Fuses; Instruction sets; Kernel; Registers; Servers; Sockets; Data Intensive; Hadoop; Locality; Parallel File System; Virtual File System;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/CSE.2013.35
Filename :
6755213
Link To Document :
بازگشت