DocumentCode :
1961979
Title :
Deflating the dimensionality curse using multiple fractal dimensions
Author :
Pagel, Bernd-Uwe ; Korn, Flip ; Faloutsos, Christos
Author_Institution :
SAP AG, Walldorf, Germany
fYear :
2000
fDate :
2000
Firstpage :
589
Lastpage :
598
Abstract :
Nearest neighbor queries are important in many settings, including spatial databases (find the k closet cities) and multimedia databases (find the k most similar images). Previous analyses have concluded that nearest neighbor search is hopeless in high dimensions, due to the notorious “curse of dimensionality”. However, their precise analysis over real data sets is still an open problem. The typical and often implicit assumption in previous studies is that the data is uniformly distributed, with independence between attributes. However, real data sets overwhelmingly disobey these assumptions; rather, they typically are skewed and exhibit intrinsic (“fractal”) dimensionalities that are much lower than their embedding dimension, e.g., due to subtle dependencies between attributes. We show how the Hausdorff and correlation fractal dimensions of a data set can yield extremely accurate formulas that can predict I/O performance to within one standard deviation. The practical contributions of this work are our accurate formulas which can be used for query optimization in spatial and multimedia databases. The theoretical contribution is the `deflation´ of the dimensionality curse. Our theoretical and empirical results show that previous worst-case analysis of nearest neighbor search in high dimensions are over-pessimistic, to the point of being unrealistic. The performance depends critically on the intrinsic (“fractal”) dimensionality as opposed to the embedding dimension that the uniformity assumption incorrectly implies
Keywords :
database theory; fractals; multimedia databases; optimisation; query processing; visual databases; data sets; dimensionality; fractal dimensions; input output performance; multimedia databases; multiple fractal dimensions; nearest neighbor queries; nearest neighbor search; query optimization; spatial databases; Cities and towns; Contracts; Electrical capacitance tomography; Electronic switching systems; Fractals; Geographic Information Systems; Independent component analysis; National electric code; Nearest neighbor searches; Query processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2000. Proceedings. 16th International Conference on
Conference_Location :
San Diego, CA
ISSN :
1063-6382
Print_ISBN :
0-7695-0506-6
Type :
conf
DOI :
10.1109/ICDE.2000.839457
Filename :
839457
Link To Document :
بازگشت