DocumentCode
3165733
Title
Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions
Author
Tianyi Zhou ; Wei Bian ; Dacheng Tao
Author_Institution
Centre for Quantum Comput. & Intell. Syst., Univ. of Technol. Sydney, Sydney, NSW, Australia
fYear
2013
fDate
7-10 Dec. 2013
Firstpage
917
Lastpage
926
Abstract
Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption, which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low-dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non-iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank-k matrix by solving math cal O(klog k) sub-problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.
Keywords
data handling; divide and conquer methods; geometry; iterative methods; linear programming; matrix decomposition; conical hull; convex hull; divide-and-conquer anchoring; fast hypothesis testing; greedy pursuit methods; linear programming; low- dimensional geometry; low-dimensional random spaces; near-separable nonnegative matrix factorization; polynomial time; separability assumption; Bismuth; Geometry; Noise; Noise measurement; Robustness; Testing; Vectors; Nonnegative matrix factorization; anchor points; big data; conical hull; divide-and-conquer; matrix completion; random projection; separability assumption;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location
Dallas, TX
ISSN
1550-4786
Type
conf
DOI
10.1109/ICDM.2013.29
Filename
6729576
Link To Document