• DocumentCode
    3765470
  • Title

    Implementation of the KNN algorithm based on Hadoop

  • Author

    Shengpeng Lu;Weiqin Tong;Zuanjian Chen

  • Author_Institution
    School of Computer Engineering and Science, Shanghai University, Shanghai, P.R. China
  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    123
  • Lastpage
    126
  • Abstract
    K-Nearest Neighbors algorithm (KNN) is simple, effective and linear in the field of text classification. The major constraint of the KNN algorithm is to resolve its time complexity. Hadoop provides the distributed processing of large data sets over clusters of computers using simple programming models. In this paper, KNN algorithm has been improved by implementing on Hadoop, taking advantage of distributed processing and the linear feature of the KNN algorithm. The speedups have been compared by using different number of nodes with each different data size. The results of the experiments show that good speedup curve for parallel KNN algorithm uses at least three nodes. This implementation can also improve the scope of the KNN algorithm.
  • Publisher
    iet
  • Conference_Titel
    Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1049/cp.2015.0265
  • Filename
    7446448