DocumentCode :
1997713
Title :
Exploration of dimensionality reduction for text visualization
Author :
Huang, Shiping ; Ward, Matthew O. ; Rundensteiner, Elke A.
Author_Institution :
Dept. of Comput. Sci., Worcester Polytech. Inst., MA, USA
fYear :
2005
fDate :
38538
Firstpage :
63
Lastpage :
74
Abstract :
In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., self-organizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to two, as this facilitates plotting the results. The validity and effectiveness of these approaches largely depend on the specific data sets used and semantics of the targeted applications. To date, there has been little evaluation to assess and compare dimensionality reduction methods and dimensionality reduction processes, either numerically or empirically. The focus of this paper is to propose a mechanism for comparing and evaluating the effectiveness of dimensionality reduction techniques in the visual exploration of text document archives. We use multivariate visualization techniques and interactive visual exploration to study three problems: (a) Which dimensionality reduction technique best preserves the interrelationships within a set of text documents; (b) What is the sensitivity of the results to the number of output dimensions; (c) Can we automatically remove redundant or unimportant words from the vector extracted from the documents while still preserving the majority of information, and thus make dimensionality reduction more efficient. To study each problem, we generate supplemental dimensions based on several dimensionality reduction algorithms and parameters controlling these algorithms. We then visually analyze and explore the characteristics of the reduced dimensional spaces as implemented within a linked, multiview multidimensional visual exploration tool, XmdvTool. We compare the derived dimensions to features known to be present in the original data. Quantitative measures are also used in identifying the quality of results using different numbers of output dimensions.
Keywords :
data reduction; data visualisation; self-organising feature maps; statistical analysis; text analysis; XmdvTool; dimension reduction; dimensionality reduction exploration; interactive visual exploration; multidimensional scaling; multivariate visualization; multiview multidimensional visual exploration tool; neurocomputation model; self-organizing feature map; statistical analysis tool; text document archive; text document visualization; text visualization; Automatic generation control; Computer science; Data mining; Data visualization; Internet; Multidimensional systems; Principal component analysis; Self organizing feature maps; Statistical analysis; Wireless communication; Dimension reduction; multidimensional scaling; self-organizing maps (SOM); text visualization.;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Coordinated and Multiple Views in Exploratory Visualization, 2005. (CMV 2005). Proceedings. Third International Conference on
Print_ISBN :
0-7695-2396-X
Type :
conf
DOI :
10.1109/CMV.2005.8
Filename :
1508222
Link To Document :
بازگشت