Title :
UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
Author :
Jaegul Choo ; Changhyun Lee ; Reddy, C.K. ; Park, Heejung
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.
Keywords :
data analysis; data visualisation; interactive systems; matrix decomposition; text analysis; UTOPIAN; flexible visual analytics system; latent Dirichlet allocation; probabilistic graphical modeling; real-world document corpuses; reliable visual analytics system; semisupervised formulation; text document collection analysis; topic modeling method; topic modeling techniques; user feedback; user-driven manner; user-driven topic modeling based on interactive nonnegative matrix factorization; visual text analytics; Analytical models; Computational modeling; Context modeling; Interactive states; Visual analytics; Analytical models; Computational modeling; Context modeling; Interactive states; Latent dirichlet allocation; Visual analytics; interactive clustering; nonnegative matrix factorization; text analytics; topic modeling; visual analytics; Artificial Intelligence; Computer Graphics; Computer Simulation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Models, Statistical; Natural Language Processing; Pattern Recognition, Automated; Software;
Journal_Title :
Visualization and Computer Graphics, IEEE Transactions on
DOI :
10.1109/TVCG.2013.212