مرکز منطقه ای اطلاع رساني علوم و فناوري - K-Means Clustering with Bagging and MapReduce

DocumentCode :

2589583

Title :

K-Means Clustering with Bagging and MapReduce

Author :

Li, Hai-Guang ; Wu, Gong-Qing ; Hu, Xue-Gang ; Zhang, Jing ; Li, Lian ; Wu, Xindong

Author_Institution :

Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China

fYear :

2011

fDate :

4-7 Jan. 2011

Firstpage :

Lastpage :

Abstract :

Clustering is one of the most widely used techniques for exploratory data analysis. Across all disciplines, from social sciences over biology to computer science, people try to get a first intuition about their data by identifying meaningful groups among the data objects. K-means is one of the most famous clustering algorithms. Its simplicity and speed allow it to run on large data sets. However, it also has several drawbacks. First, this algorithm is instable and sensitive to outliers. Second, its performance will be inefficient when dealing with large data sets. In this paper, a method is proposed to solve those problems, which uses an ensemble learning method bagging to overcome the instability and sensitivity to outliers, while using a distributed computing framework MapReduce to solve the inefficiency problem in clustering on large data sets. Extensive experiments have been performed to show that our approach is efficient.

Keywords :

bagging; data analysis; distributed processing; pattern clustering; K-means clustering algorithm; MapReduce; computer science; data analysis; data object; data set; distributed computing; ensemble learning method bagging; social sciences; Algorithm design and analysis; Bagging; Clustering algorithms; Computer science; Machine learning algorithms; Merging; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

System Sciences (HICSS), 2011 44th Hawaii International Conference on

Conference_Location :

Kauai, HI

ISSN :

1530-1605

Print_ISBN :

978-1-4244-9618-1

Type :

conf

DOI :

10.1109/HICSS.2011.265

Filename :

5718506

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2589583