Title :
Distributed Pivot Clustering with arbitrary distance functions
Author :
Branting, L. Karl
Abstract :
This paper describes an algorithm, Distributed Pivot Clustering (DPC), that differs from prior distributed clustering algorithms in that it requires neither an inexpensive approximation of the actual distance function nor that pairs of elements in the same cluster share at least one exact feature value. Instead, DPC requires only that the distance function satisfy the triangle inequality and be of sufficiently high-granularity to permit the data to be partitioned into canopies of optimal size based on distance to reference elements, or pivots. An empirical evaluation demonstrated that DPC can lead to accurate distributed hierarchical agglomerative clustering provided that the triangle inequality and granularity requirements are met.
Keywords :
distributed algorithms; pattern clustering; DPC algorithm; arbitrary distance functions; distributed hierarchical agglomerative clustering; distributed pivot clustering algorithms; empirical evaluation; feature value; granularity requirements; triangle inequality; Accuracy; Approximation algorithms; Approximation methods; Clustering algorithms; Histograms; Indexes; Vectors;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691729