Title :
Large Data Clustering Using Quadratic Programming: A Comprehensive Quantitative Analysis
Author :
Alireza Chakeri;Lawrence O. Hall
Author_Institution :
Comput. Sci. &
Abstract :
We address the space complexity challenge in large graph clustering using quadratic programming, and present a comprehensive technical analysis and alternative solution based on game theoretic concepts. We develop an approximate solution to the problem of clustering graphs with a large number of vertices in order to overcome the space complexity issues. Particularly, the edge weights between every pair of vertices are required which proves practically intractable for large data sets. Our scalable method divides a graph into disjoint tractable size subgraphs, where their clusters are enumerated based on a novel solution space search. Then, the clusters obtained in each subgraph are grouped using a low resolution ensemble clustering method. The exact maxima of the quadratic programming problem on the entire graph is approximated by the maxima of the subsets of the graph. Finally, vertices are assigned to the final clusters using a linear game theoretic relation. We also propose the question "How can a cluster of a subset of a dataset be a cluster of the entire dataset?". We show that, in the quadratic programming framework, this problem is coNP-hard. Hence, we modify the definition of a cluster from a stable concept to a non-stable but optimal one (Nash equilibrium) that makes it computationally practical to find clusters in graphs with large numbers of vertices. On the Berkeley Segmentation Dataset, the proposed method achieves results comparable to the state of the art, providing a parallel framework for image segmentation.
Keywords :
"Games","Nash equilibrium","Sociology","Statistics","Quadratic programming","Symmetric matrices"
Conference_Titel :
Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
Electronic_ISBN :
2375-9259
DOI :
10.1109/ICDMW.2015.151