• DocumentCode
    3357543
  • Title

    Understanding text corpora with multiple facets

  • Author

    Shi, Lei ; Wei, Furu ; Liu, Shixia ; Tan, Li ; Lian, Xiaoxiao ; Zhou, Michelle X.

  • Author_Institution
    IBM Res. - China, Beijing, China
  • fYear
    2010
  • fDate
    25-26 Oct. 2010
  • Firstpage
    99
  • Lastpage
    106
  • Abstract
    Text visualization becomes an increasingly more important research topic as the need to understand massive-scale textual information is proven to be imperative for many people and businesses. However, it is still very challenging to design effective visual metaphors to represent large corpora of text due to the unstructured and high-dimensional nature of text. In this paper, we propose a data model that can be used to represent most of the text corpora. Such a data model contains four basic types of facets: time, category, content (unstructured), and structured facet. To understand the corpus with such a data model, we develop a hybrid visualization by combining the trend graph with tag-clouds. We encode the four types of data facets with four separate visual dimensions. To help people discover evolutionary and correlation patterns, we also develop several visual interaction methods that allow people to interactively analyze text by one or more facets. Finally, we present two case studies to demonstrate the effectiveness of our solution in support of multi-faceted visual analysis of text corpora.
  • Keywords
    data models; data visualisation; text analysis; user interfaces; data facets; data model; text analysis; text corpora; text visualization; visual analysis; visual interaction method; Correlation; Data mining; Data models; Data visualization; Layout; Navigation; Visualization; multi-facet data visualization; text visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4244-9488-0
  • Electronic_ISBN
    978-1-4244-9487-3
  • Type

    conf

  • DOI
    10.1109/VAST.2010.5652931
  • Filename
    5652931