• DocumentCode
    1114759
  • Title

    The Discrete Basis Problem

  • Author

    Miettinen, Pauli ; Mielikainen, T. ; Gionis, Aristides ; Das, Gautam ; Mannila, Heikki

  • Author_Institution
    Helsinki Inst. for Inf. Technol., Helsinki Univ., Helsinki
  • Volume
    20
  • Issue
    10
  • fYear
    2008
  • Firstpage
    1348
  • Lastpage
    1362
  • Abstract
    Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the observed data can be expressed as combinations of the basis vectors. Decomposition methods have been studied extensively, but many methods return real-valued matrices. Interpreting real-valued factor matrices is hard if the original data is Boolean. In this paper, we describe a matrix decomposition formulation for Boolean data, the Discrete Basis Problem. The problem seeks for a Boolean decomposition of a binary matrix, thus allowing the user to easily interpret the basis vectors. We also describe a variation of the problem, the Discrete Basis Partitioning Problem. We show that both problems are NP-hard. For the Discrete Basis Problem, we give a simple greedy algorithm for solving it; for the Discrete Basis Partitioning Problem we show how it can be solved using existing methods. We present experimental results for the greedy algorithm and compare it against other, well known methods. Our algorithm gives intuitive basis vectors, but its reconstruction error is usually larger than with the real-valued methods. We discuss about the reasons for this behavior.
  • Keywords
    Boolean algebra; data handling; greedy algorithms; matrix decomposition; Boolean decomposition; binary matrix; data matrix; discrete basis partitioning problem; greedy algorithm; matrix decomposition methods; real-valued factor matrices; Clustering; Mining methods and algorithms; Text mining; and association rules; classification;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.53
  • Filename
    4479462