Title :
Using Betweenness Centrality to Identify Manifold Shortcuts
Author :
Cukierski, William J. ; Foran, David J.
Author_Institution :
Rutgers Univ., Piscataway, NJ
Abstract :
High-dimensional data presents a significant challenge to a broad spectrum of pattern recognition and machine-learning applications. Dimensionality reduction (DR) methods serve to remove unwanted variance and make such problems tractable. Several nonlinear DR methods, such as the well known ISOMAP algorithm, rely on a neighborhood graph to compute geodesic distances between data points. These graphs may sometimes contain unwanted edges which connect disparate regions of one or more manifolds. This topological sensitivity is well known, yet managing high-dimensional, noisy data in the absence of a priori knowledge, remains an open and difficult problem. This manuscript introduces a divisive, edge-removal method based on graph betweenness centrality which can robustly identify manifold-shorting edges. The problem of graph construction in high dimensions is discussed and the proposed algorithm is inserted into the ISOMAP workflow. ROC analysis is performed and the performance is tested on both synthetic and real datasets.
Keywords :
data reduction; graph theory; learning (artificial intelligence); dimensionality reduction method; graph construction; graph edge-removal method; high-dimensional data; isometric mapping algorithm; machine-learning application; manifold shortcut; pattern recognition; Clustering algorithms; Conferences; Data mining; Dentistry; Geophysics computing; Knowledge management; Manifolds; Nonlinear distortion; Pattern recognition; Robustness; betweenness; centrality; dimensionality reduction; graph theory; isomap;
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
DOI :
10.1109/ICDMW.2008.39