• DocumentCode
    2207345
  • Title

    Simplex decompositions for real-valued datasets

  • Author

    Shashanka, Madhusudana

  • Author_Institution
    Mars, Inc., Mount Olive, NJ, USA
  • fYear
    2009
  • fDate
    1-4 Sept. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper, we introduce the concept of Simplex Decompositions and present a new Semi-Nonnegative decomposition technique that works with real-valued datasets. The motivation stems from the limitations of topic models such as Probabilistic Latent Semantic Analysis (PLSA), that have found wide use in the analysis of non-negative data apart from text corpora such as images, audio spectra, gene array data among others. The goal of this paper is to remove the non-negativity requirement for datasets so that these models can work on datasets with both positive and negative entries. We start by showing that PLSA is equivalent to finding a set of components that define the corners of a simplex within which all datapoints lie. We formalize this intuition by introducing the notion of Simplex Decompositions-PLSA and extensions are specific examples-and generalize the idea to be applicable to arbitrary real datasets with both positive and negative entries. We present algorithms and illustrate the method with examples.
  • Keywords
    data analysis; probability; singular value decomposition; nonnegative data analysis; probabilistic latent semantic analysis; real-valued dataset; semi-nonnegative decomposition; simplex decomposition; Application software; Computer vision; Data analysis; Data mining; Gene expression; Independent component analysis; Mars; Matrix decomposition; Principal component analysis; Singular value decomposition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing, 2009. MLSP 2009. IEEE International Workshop on
  • Conference_Location
    Grenoble
  • Print_ISBN
    978-1-4244-4947-7
  • Electronic_ISBN
    978-1-4244-4948-4
  • Type

    conf

  • DOI
    10.1109/MLSP.2009.5306224
  • Filename
    5306224