• DocumentCode
    1679532
  • Title

    A new geometric approach to latent topic modeling and discovery

  • Author

    Weicong Ding ; Rohban, Mohammad Hossein ; Ishwar, Prakash ; Saligrama, Venkatesh

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Boston Univ., Boston, MA, USA
  • fYear
    2013
  • Firstpage
    5568
  • Lastpage
    5572
  • Abstract
    A new geometrically-motivated algorithm for topic modeling is developed and applied to the discovery of latent “topics” in text and image “document” corpora. The algorithm is based on robustly finding and clustering extreme-points of empirical cross-document word-frequencies that correspond to novel words unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state- of-the-art approaches on synthetic and real-world datasets.
  • Keywords
    approximation theory; data mining; document image processing; optimisation; pattern clustering; text analysis; empirical cross-document word-frequency; geometrically-motivated algorithm; image document corpora; latent topic discovery; latent topic modeling; locally-optimal method; nonconvex optimization problem; polynomial complexity; suboptimal approximation; Abstracts; Games; Integrated circuits; Logic gates; Nominations and elections; Support vector machines; Topic modeling; extreme points; nonnegative matrix factorization (NMF); subspace clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638729
  • Filename
    6638729