DocumentCode
1934943
Title
Parallel k-modes algorithm based on MapReduce
Author
Guo Tao ; Ding Xiangwu ; Li Yefeng
Author_Institution
Coll. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
fYear
2015
fDate
3-5 Feb. 2015
Firstpage
176
Lastpage
179
Abstract
K-modes is a typical categorical clustering algorithm. Firstly, we improve the process of K-modes: when allocating categorical objects to clusters, the number of each attribute item in clusters is updated, so that the new modes of clusters can be computed after reading the whole dataset once. In order to make K-modes capable for large-scale categorical data, we then implement K-modes on Hadoop using MapReduce parallel computing model. Experiments show that, parallel k-modes archives good speedup ratio when dealing with large-scale categorical data.
Keywords
parallel processing; pattern clustering; Hadoop; MapReduce parallel computing model; attribute item; categorical clustering algorithm; large-scale categorical data; parallel k-modes algorithm; speedup ratio; Clustering algorithms; Computational modeling; Computers; Data models; Educational institutions; Parallel processing; Servers; MapReduce; categorical data; k-modes; parallel clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information, Networking, and Wireless Communications (DINWC), 2015 Third International Conference on
Conference_Location
Moscow
Print_ISBN
978-1-4799-6375-1
Type
conf
DOI
10.1109/DINWC.2015.7054238
Filename
7054238
Link To Document