Title :
SQL-MapReduce hybrid approach towards distributed projected clustering
Author :
Harikumar, Sandhya ; Shyju, M. ; Kaimal, M.R.
Author_Institution :
Dept. of Comput. Sci. & Eng., Amrita Vishwa Vidyapeetham, Kollam, India
Abstract :
Clustering high dimensional data is a major challenge in data mining due to the existence of inherent complexity and sparsity of the data. Projected clustering is one of the clustering approaches that determine the clusters in the subspaces of such high dimensional data. However, projected clustering within DBMS is quite computationally expensive in time and space complexity, when the volume of records is in terms of terabytes, petabytes and more. This expensive computation becomes a hurdle especially when the data clustering on transactional data is used as a preprocessing step for other tasks such as frequent decision making, efficient indexing, compression, etc. Hence, parallelizing and distributing expensive data clustering tasks becomes attractive in terms of speed-up of computation and the increased amount of memory available in a computing cluster. Inorder to achieve this, we propose a SQL-MapReduce hybrid approach for scalable projected clustering.
Keywords :
SQL; parallel programming; pattern clustering; transaction processing; DBMS; SQL-MapReduce hybrid approach; computation speed-up; computing cluster; data clustering task distribution; data clustering task parallelization; data complexity; data mining; data sparsity; distributed projected clustering; high-dimensional data clustering; memory improvement; scalable projected clustering; transactional data; Algorithm design and analysis; Clustering algorithms; Data analysis; Distributed databases; Merging; Relational databases; Distributed computing; Hadoop; Hive; MapReduce; Parallel computing; Projected clustering;
Conference_Titel :
Data Science & Engineering (ICDSE), 2014 International Conference on
Conference_Location :
Kochi
Print_ISBN :
978-1-4799-6870-1
DOI :
10.1109/ICDSE.2014.6974605