DocumentCode
2886092
Title
Generalized Expansion Dimension
Author
Houle, Michael E. ; Kashima, Hideyuki ; Nett, Michael
Author_Institution
Nat. Inst. of Inf., Tokyo, Japan
fYear
2012
fDate
10-10 Dec. 2012
Firstpage
587
Lastpage
594
Abstract
In this paper we propose a framework for modeling the intrinsic dimensionality of data sets. The models can be viewed as generalizations of the expansion dimension, which was originally proposed for the analysis of certain similarity search indices using the Euclidean distance metric. Here, we extend the original model to other metric spaces: vector spaces with the Lp or vector angle (cosine similarity) distance measures, as well as product spaces for categorical data. We also provide a practical guide for estimating both local and global intrinsic dimensionality. The estimates of data complexity can subsequently be used in the design and analysis of algorithms for data mining applications such as search, clustering, classification, and outlier detection.
Keywords
data mining; Euclidean distance metric; categorical data; cosine similarity distance measures; data complexity estimation; data mining applications; data set intrinsic dimensionality; generalized expansion dimension; global intrinsic dimensionality; local intrinsic dimensionality; product spaces; similarity search index analysis; vector angle distance measures; vector spaces; Complexity theory; Data mining; Data models; Extraterrestrial measurements; Search problems; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
Print_ISBN
978-1-4673-5164-5
Type
conf
DOI
10.1109/ICDMW.2012.94
Filename
6406405
Link To Document