• DocumentCode
    2771772
  • Title

    Convex Non-negative Matrix Factorization in the Wild

  • Author

    Thurau, Christian ; Kersting, Kristian ; Bauckhage, Christian

  • Author_Institution
    Fraunhofer IAIS, St. Augustin, Germany
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    523
  • Lastpage
    532
  • Abstract
    Non-negative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a non-negative input matrix V into two non-negative matrix factors V = WH such that W describes "clusters" of the datasets. Analyzing genotypes, social networks, or images, it can be beneficial to ensure V to contain meaningful "cluster centroids", i.e., to restrict W to be convex combinations of data points. But how can we run this convex NMF in the wild, i.e., given millions of data points? Triggered by the simple observation that each data point is a convex combination of vertices of the data convex hull, we propose to restrict W further to be vertices of the convex hull. The benefits of this convex-hull NMF approach are twofold. First, the expected size of the convex hull of, for example, n random Gaussian points in the plane is ¿(¿log n), i.e., the candidate set typically grows much slower than the data set. Second, distance preserving low-dimensional embeddings allow one to compute candidate vertices efficiently. Our extensive experimental evaluation shows that convex-hull NMF compares favorably to convex NMF for large data sets both in terms of speed and reconstruction quality. Moreover, we show that our method can easily be applied to large-scale, real-world data sets, in our case consisting of 1.6 million images respectively 150 million votes on World of Warcraft ® guilds.
  • Keywords
    computer vision; data mining; information retrieval; matrix decomposition; cluster centroids; computer vision; convex hull non-negative matrix factorization; data convex hull; data mining; information retrieval; random Gaussian points; Computer vision; Data analysis; Data mining; Embedded computing; Image analysis; Image reconstruction; Information retrieval; Large-scale systems; Social network services; Voting; archetypal analysis; data handling; data mining; matrix decomposition; non negative matrix factorization; social network analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.55
  • Filename
    5360278