Title :
Principal Curve Algorithms for Partitioning High-Dimensional Data Spaces
Author :
Zhang, Junping ; Wang, Xiaodan ; Kruger, Uwe ; Wang, Fei-Yue
Author_Institution :
Shanghai Key Lab. of Intell. Inf. Process., Fudan Univ., Shanghai, China
fDate :
3/1/2011 12:00:00 AM
Abstract :
Most partitioning algorithms iteratively partition a space into cells that contain underlying linear or nonlinear structures using linear partitioning strategies. The compactness of each cell depends on how well the (locally) linear partitioning strategy approximates the intrinsic structure. To partition a compact structure for complex data in a nonlinear context, this paper proposes a nonlinear partition strategy. This is a principal curve tree (PC-tree), which is implemented iteratively. Given that a PC passes through the middle of the data distribution, it allows for partitioning based on the arc length of the PC. To enhance the partitioning of a given space, a residual version of the PC-tree algorithm is developed, denoted here as the principal component analysis tree (PCR-tree) algorithm. Because of its residual property, the PCR-tree can yield the intrinsic dimension of high-dimensional data. Comparisons presented in this paper confirm that the proposed PC-tree and PCR-tree approaches show a better performance than several other competing partitioning algorithms in terms of vector quantization error and nearest neighbor search. The comparison also shows that the proposed algorithms outperform competing linear methods in total average coverage which measures the nonlinear compactness of partitioning algorithms.
Keywords :
algorithm theory; data analysis; principal component analysis; tree data structures; PCR-tree; data distribution; high dimensional data space partitioning; linear partitioning strategy; nearest neighbor search; nonlinear structure; principal component analysis tree; principal curve tree; residual property; vector quantization error; Algorithm design and analysis; Approximation algorithms; Computational complexity; Manifolds; Partitioning algorithms; Principal component analysis; Manifold learning; principal component analysis; principal curves; space partitioning; tree-based algorithms; Algorithms; Artificial Intelligence; Data Interpretation, Statistical; Decision Trees; Models, Neurological; Neural Networks (Computer); Nonlinear Dynamics; Pattern Recognition, Automated; Principal Component Analysis;
Journal_Title :
Neural Networks, IEEE Transactions on
DOI :
10.1109/TNN.2010.2100408