Title :
Multi-dimensional Index on Hadoop Distributed File System
Author :
Liao, Haojun ; Han, Jizhong ; Fang, Jinyun
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
Abstract :
In this paper, we present an approach to construct a built-in block-based hierarchical index structures, like R-tree, to organize data sets in one, two, or higher dimensional space and improve the query performance towards the common query types (e.g., point query, range query) on Hadoop distributed file system (HDFS). The query response time for data sets that are stored in HDFS can be significantly reduced by avoiding exhaustive search on the corresponding data sets in the presence of index structures. The basic idea is to adopt the conventional hierarchical structure to HDFS, and several issues, including index organization, index node size, buffer management, and data transfer protocol, are considered to reduce the query response time and data transfer overhead through network. Experimental evaluation demonstrates that the built-in index structure can efficiently improve query performance, and serve as cornerstones for structured or semi-structured data management.
Keywords :
data structures; distributed databases; query processing; Hadoop distributed file system; buffer management; built-in block-based hierarchical index structure; data transfer overhead; data transfer protocol; index node size; index organization; multidimensional index; query performance; query response time; query types; semi-structured data management; Distributed databases; Indexes; Protocols; Query processing; Servers; Time factors; HDFS; Hadoop; Multi-dimensional index; Query processing;
Conference_Titel :
Networking, Architecture and Storage (NAS), 2010 IEEE Fifth International Conference on
Conference_Location :
Macau
Print_ISBN :
978-1-4244-8133-0
DOI :
10.1109/NAS.2010.44