• DocumentCode
    3759374
  • Title

    K-Means Clustering Algorithm for Large-Scale Chinese Commodity Information Web Based on Hadoop

  • Author

    Geng Yushui;Zhang Lishuo

  • Author_Institution
    Sch. of Inf., Qilu Univ. of Technol., Jinan, China
  • fYear
    2015
  • Firstpage
    256
  • Lastpage
    259
  • Abstract
    With the growing popularity of the network, product information filled in the many pages of the Internet, which you want to get the information you need on these pages tend to consider clustering information, and the current explosive growth of data so that the information mass storage condition occurs, clustering to facing the problems such as large calculation complexity and time consuming, then the traditional K-Means clustering algorithm does not meet the needs of large data environments today, so this article combined with the advantages of the Hadoop platform and MapReduce programming model is proposed the K-Means clustering algorithm for large-scale Chinese commodity information Web based on Hadoop. Map function calculates the distance from the cluster center for each sample and mark to their category, Reduce function intermediate results are summarized and calculated new clustering center for the next round of iteration. Experimental results show that this method can better improve the clustering processing speed.
  • Keywords
    "Clustering algorithms","Algorithm design and analysis","Parallel processing","Distributed databases","Computational modeling","Data models","Web pages"
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing and Applications for Business Engineering and Science (DCABES), 2015 14th International Symposium on
  • Type

    conf

  • DOI
    10.1109/DCABES.2015.71
  • Filename
    7429605