• DocumentCode
    704143
  • Title

    Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

  • Author

    Ge Song ; Rochas, Justine ; Huet, Fabrice ; Magoules, Frederic

  • Author_Institution
    I3S, Univ. Nice Sophia Antipolis, Sophia Antipolis, France
  • fYear
    2015
  • fDate
    4-6 March 2015
  • Firstpage
    279
  • Lastpage
    287
  • Abstract
    Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.
  • Keywords
    data handling; distributed processing; pattern classification; MapReduce programming model; classification; distributed approach; k nearest neighbor join processing; kNN implementations; large scale data processing; massive data; Accuracy; Complexity theory; Computational modeling; Data models; Programming; Silicon; Sorting; Data Partition; Hadoop; MapReduce; kNN Join;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on
  • Conference_Location
    Turku
  • ISSN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2015.79
  • Filename
    7092733