Title :
Representative subsets for big data learning using k-NN graphs
Author :
Mall, Raghvendra ; Jumutc, Vilen ; Langone, Rocco ; Suykens, Johan A. K.
Author_Institution :
ESAT/STADIUS, KU Leuven, Leuven, Belgium
Abstract :
In this paper we propose a deterministic method to obtain subsets from big data which are a good representative of the inherent structure in the data. We first convert the large scale dataset into a sparse undirected k-NN graph using a distributed network generation framework that we propose in this paper. After obtaining the k-NN graph we exploit the fast and unique representative subset (FURS) selection method [1], [2] to deterministically obtain a subset for this big data network. The FURS selection technique selects nodes from different dense regions in the graph retaining the natural community structure. We then locate the points in the original big data corresponding to the selected nodes and compare the obtained subset with subsets acquired from state-of-the-art subset selection techniques. We evaluate the quality of the selected subset on several synthetic and real-life datasets for different learning tasks including big data classification and big data clustering.
Keywords :
Big Data; graph theory; learning (artificial intelligence); pattern classification; pattern clustering; FURS; big data classification; big data clustering; big data learning; fast and unique representative subset selection method; natural community structure; real-life datasets; representative subsets; sparse undirected k-NN graph; synthetic datasets; Big data; Communities; Entropy; Kernel; Predictive models; Standards; Training;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004210