Title :
Local representativeness in vector data
Author :
Zehnalova, Sarka ; Kudelka, Milos ; Platos, Jan
Author_Institution :
VSB - Tech. Univ. of Ostrava, Ostrava, Czech Republic
Abstract :
The amount of large-scale real data around us is increasing in size very quickly, as is the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, the direct application of which on original data would be unfeasible. Conventional sampling methods provide non-deterministic results trying to preserve selected characteristics of the input dataset. We present a novel, simple, straightforward and deterministic approach with the same goal. It is not sampling in the true sense but a reduction of vector data, which maintains very well internal data structures (clusters and density). The approach is based on analyzing the nearest neighbors. Our suggested x-representativeness then takes into account the local density of the data and nearest neighbors of individual data objects. Following that, we also present experiments with two different datasets. The aim of these experiments is to show that the x-representativeness can be used to deterministically reduce the datasets to differently sized samples of representatives, while maintaining properties of the original datasets.
Keywords :
data compression; data mining; data structures; sampling methods; analytical methods; data clusters; data objects; data size reduction; deterministic approach; deterministic dataset reduction; internal data structures; large-scale real data; local data density; local representativeness; nearest neighbor analysis; sampling methods; vector data reduction; x-representativeness; Cities and towns; Clustering algorithms; Complexity theory; Data mining; Equations; Sampling methods; Vectors; data mining; density bias; nearest neighbor; sampling;
Conference_Titel :
Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
Conference_Location :
San Diego, CA
DOI :
10.1109/SMC.2014.6974025