• DocumentCode
    119492
  • Title

    Serendip: Topic model-driven visual exploration of text corpora

  • Author

    Alexander, Eric ; Kohlmann, Joe ; Valenza, Robin ; Witmore, Michael ; Gleicher, Michael

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Wisconsin-Madison, Madison, WI, USA
  • fYear
    2014
  • fDate
    25-31 Oct. 2014
  • Firstpage
    173
  • Lastpage
    182
  • Abstract
    Exploration and discovery in a large text corpus requires investigation at multiple levels of abstraction, from a zoomed-out view of the entire corpus down to close-ups of individual passages and words. At each of these levels, there is a wealth of information that can inform inquiry - from statistical models, to metadata, to the researcher´s own knowledge and expertise. Joining all this information together can be a challenge, and there are issues of scale to be combatted along the way. In this paper, we describe an approach to text analysis that addresses these challenges of scale and multiple information sources, using probabilistic topic models to structure exploration through multiple levels of inquiry in a way that fosters serendipitous discovery. In implementing this approach into a tool called Serendip, we incorporate topic model data and metadata into a highly reorderable matrix to expose corpus level trends; extend encodings of tagged text to illustrate probabilistic information at a passage level; and introduce a technique for visualizing individual word rankings, along with interaction techniques and new statistical methods to create links between different levels and information types. We describe example uses from both the humanities and visualization research that illustrate the benefits of our approach.
  • Keywords
    data visualisation; matrix algebra; meta data; probability; statistical analysis; text analysis; interaction technique; metadata; multiple information source; probabilistic information; probabilistic topic model; reorderable matrix; serendip; statistical method; statistical model; structure exploration; text analysis; text corpora; topic model data; topic model-driven visual exploration; visualization research; word ranking; Adaptation models; Data models; Data visualization; Market research; Measurement; Probabilistic logic; Vectors; Text visualization; topic modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Visual Analytics Science and Technology (VAST), 2014 IEEE Conference on
  • Conference_Location
    Paris
  • Type

    conf

  • DOI
    10.1109/VAST.2014.7042493
  • Filename
    7042493