• DocumentCode
    1934943
  • Title

    Parallel k-modes algorithm based on MapReduce

  • Author

    Guo Tao ; Ding Xiangwu ; Li Yefeng

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
  • fYear
    2015
  • fDate
    3-5 Feb. 2015
  • Firstpage
    176
  • Lastpage
    179
  • Abstract
    K-modes is a typical categorical clustering algorithm. Firstly, we improve the process of K-modes: when allocating categorical objects to clusters, the number of each attribute item in clusters is updated, so that the new modes of clusters can be computed after reading the whole dataset once. In order to make K-modes capable for large-scale categorical data, we then implement K-modes on Hadoop using MapReduce parallel computing model. Experiments show that, parallel k-modes archives good speedup ratio when dealing with large-scale categorical data.
  • Keywords
    parallel processing; pattern clustering; Hadoop; MapReduce parallel computing model; attribute item; categorical clustering algorithm; large-scale categorical data; parallel k-modes algorithm; speedup ratio; Clustering algorithms; Computational modeling; Computers; Data models; Educational institutions; Parallel processing; Servers; MapReduce; categorical data; k-modes; parallel clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information, Networking, and Wireless Communications (DINWC), 2015 Third International Conference on
  • Conference_Location
    Moscow
  • Print_ISBN
    978-1-4799-6375-1
  • Type

    conf

  • DOI
    10.1109/DINWC.2015.7054238
  • Filename
    7054238