Title :
Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS
Author :
Liu, Xuhui ; Han, Jizhong ; Zhong, Yunqin ; Han, Chengde ; He, Xubin
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
fDate :
Aug. 31 2009-Sept. 4 2009
Abstract :
Hadoop framework has been widely used in various clusters to build large scale, high performance systems. However, Hadoop distributed file system (HDFS) is designed to manage large files and suffers performance penalty while managing a large amount of small files. As a consequence, many web applications, like WebGIS, may not take benefits from Hadoop. In this paper, we propose an approach to optimize I/O performance of small files on HDFS. The basic idea is to combine small files into large ones to reduce the file number and build index for each file. Furthermore, some novel features such as grouping neighboring files and reserving several latest version of data are considered to meet the characteristics of WebGIS access patterns. Preliminary experiment results show that our approach achieves better performance.
Keywords :
Web services; geographic information systems; network operating systems; HDFS framework; Hadoop distributed file system; WebGIS; file I-O performance; File servers; File systems; Geographic Information Systems; High performance computing; Large-scale systems; Middleware; Open source software; Scalability; Web and internet services; Web server; HDFS; Hadoop; Small File I/O Performance; WebGIS;
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2009.5289196