DocumentCode
2766321
Title
High Performance Dimension Reduction and Visualization for Large High-Dimensional Data Analysis
Author
Choi, Jong Youl ; Bae, Seung-Hee ; Qiu, Xiaohong ; Fox, Geoffrey
Author_Institution
Pervasive Technol. Inst., Indiana Univ., Bloomington, IN, USA
fYear
2010
fDate
17-20 May 2010
Firstpage
331
Lastpage
340
Abstract
Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining approaches or just for browsing them in a way that distance between points in visualization (2D or 3D) space tracks that in original high dimensional space. Dimension reduction is a well understood approach but can be very time and memory intensive for large problems. Here we report on parallel algorithms for Scaling by MAjorizing a Complicated Function (SMACOF) to solve Multidimensional Scaling problem and Generative Topographic Mapping (GTM). The former is particularly time consuming with complexity that grows as square of data set size but has advantage that it does not require explicit vectors for dataset points but just measurement of inter-point dissimilarities. We compare SMACOF and GTM on a subset of the NIH PubChem database which has binary vectors of length 166 bits. We find good parallel performance for both GTM and SMACOF and strong correlation between the dimension-reduced PubChem data from these two methods.
Keywords
Clouds; Clustering algorithms; Concurrent computing; Data analysis; Data mining; Data visualization; Grid computing; High performance computing; Machine learning algorithms; Multidimensional systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on
Conference_Location
Melbourne, Australia
Print_ISBN
978-1-4244-6987-1
Type
conf
DOI
10.1109/CCGRID.2010.104
Filename
5493466
Link To Document