Title :
Can Shared Nearest Neighbors Reduce Hubness in High-Dimensional Spaces?
Author :
Flexer, Arthur ; Schnitzer, Dan
Author_Institution :
Austrian Res. Inst. for Artificial Intell., Vienna, Austria
Abstract :
´Hubness´ is a recently discovered general problem of machine learning in high dimensional data spaces. Hub objects have a small distance to an exceptionally large number of data points, and anti-hubs are far from all other data points. It is related to the concentration of distances which impairs the contrast of distances in high dimensional spaces. Computation of secondary distances inspired by shared nearest neighbor (SNN) approaches has been shown to reduce hubness and concentration and there already exists some work on direct application of SNN in the context of hubness in image recognition. This study applies SNN to a larger number of high dimensional real world data sets from diverse domains and compares it to two other secondary distance approaches (local scaling and mutual proximity). SNN is shown to reduce hubness but less than other approaches and, contrary to its competitors, it is only able to improve classification accuracy for half of the data sets.
Keywords :
data handling; learning (artificial intelligence); pattern classification; SNN; data points; high dimensional data spaces; image recognition; machine learning; real world data sets; shared nearest neighbors; Accuracy; Conferences; Context; Electronic mail; Histograms; Image recognition; Standards; curse of dimensionality; hubness; machine learning; shared nearest neighors;
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
DOI :
10.1109/ICDMW.2013.101