• DocumentCode
    2133376
  • Title

    Large scale topic modeling made practical

  • Author

    Wahlgreen, Bjarne ørum ; Hansen, Lars Kai

  • Author_Institution
    Tech. Univ. of Denmark, Lyngby, Denmark
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Topic models are of broad interest. They can be used for query expansion and result structuring in information retrieval and as an important component in services such as recommender systems and user adaptive advertising. In large scale applications both the size of the database (number of documents) and the size of the vocabulary can be significant challenges. Here we discuss two mechanisms that can make scalable solutions possible in the face of large document databases and large vocabularies. The first issue is addressed by a parallel distributed implementation, while the vocabulary problem is reduced by use of large and carefully curated term set. We demonstrate the performance of the proposed system and in the process break a previously claimed `world record´ announced April 2010 both by speed and size of problem. We show that the use of a WordNet derived vocabulary can identify topics at par with a much larger case specific vocabulary.
  • Keywords
    information retrieval; recommender systems; WordNet; curated term set; document database; information retrieval; large scale topic modeling; process break; query expansion; recommender system; result structuring; user adaptive advertising; vocabulary; Adaptation models; Computational modeling; Data models; Databases; Matrix decomposition; Mutual information; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on
  • Conference_Location
    Santander
  • ISSN
    1551-2541
  • Print_ISBN
    978-1-4577-1621-8
  • Electronic_ISBN
    1551-2541
  • Type

    conf

  • DOI
    10.1109/MLSP.2011.6064628
  • Filename
    6064628