Title :
Two-stage framework for a topology-based projection and visualization of classified document collections
Author :
Oesterling, Patrick ; Scheuermann, Gerik ; Teresniak, Sven ; Heyer, Gerhard ; Koch, Steffen ; Ertl, Thomas ; Weber, Gunther H.
Author_Institution :
Univ. of Leipzig, Leipzig, Germany
Abstract :
During the last decades, electronic textual information has become the world´s largest and most important information source. Daily newspapers, books, scientific and governmental publications, blogs and private messages have grown into a wellspring of endless information and knowledge. Since neither existing nor new information can be read in its entirety, we rely increasingly on computers to extract and visualize meaningful or interesting topics and documents from this huge information reservoir. In this paper, we extend, improve and combine existing individual approaches into an overall framework that supports topologi-cal analysis of high dimensional document point clouds given by the well-known tf-idf document-term weighting method. We show that traditional distance-based approaches fail in very high dimensional spaces, and we describe an improved two-stage method for topology-based projections from the original high dimensional information space to both two dimensional (2-D) and three dimensional (3-D) visualizations. To demonstrate the accuracy and usability of this framework, we compare it to methods introduced recently and apply it to complex document and patent collections.
Keywords :
classification; computational geometry; data visualisation; document handling; information resources; blogs; books; classified document collections; daily newspapers; electronic textual information; high dimensional information space; information source; patent collections; private messages; publications; tf-idf document-term weighting method; topology-based projection; visualization; Clouds; Context; Data visualization; Electronic mail; Layout; Optimization; Topology; H.5.2 [INFORMATION INTERFACES AND PRESENTATION]: User Interfaces-Theory and methods; I.5.3 [Pattern Recognition]: Clustering-Algorithms;
Conference_Titel :
Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4244-9488-0
Electronic_ISBN :
978-1-4244-9487-3
DOI :
10.1109/VAST.2010.5652940