Title :
Index Selection on MapReduce Relational-Databases
Author :
Alsayoud, Fatimah ; Miri, Ali
Author_Institution :
Dept. of Comput. Sci., Ryerson Univ., Toronto, ON, Canada
fDate :
March 30 2015-April 2 2015
Abstract :
The physical design of data storage is a critical administrative factor for optimizing system performance. Improved system performance can be achieved by building indices. It must be noted that, although indices can improve system performance, creating many random indices may have a negative impact on system performance, and result in wasted space. Selecting indices properly is a fundamental aspect of system design optimization, but it is often a complex task. Index-selection optimization techniques have been widely studied in DataBase Management System (DBMSs). However, they have not been get the same study in MapReduce Relational-Databases. This paper focuses on the index-selection process in Hadoop-database hybrid systems. The main contribution is the utilization of data mining techniques to develop a tool for determining optimal index-set configurations. An overall evaluation shows that the index configurations recommended by the developed tool achieved an average performance gain of up to 48% in total analytical tasks performed.
Keywords :
data handling; data mining; parallel processing; relational databases; DBMS; Hadoop-database hybrid systems; MapReduce relational-databases; critical administrative factor; data mining techniques; data storage; database management system; index-selection optimization techniques; index-selection process; optimal index-set configurations; physical design; system design optimization; system performance; Big data; Data mining; Indexes; Itemsets; Physical design; System performance; Big Data; Data mining; Frequent Itemset; Hadoop; Index-Selection; MapReduce;
Conference_Titel :
Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
Conference_Location :
Redwood City, CA
DOI :
10.1109/BigDataService.2015.23