• DocumentCode
    2041353
  • Title

    Dimensionality reduction using non-negative matrix factorization for information retrieval

  • Author

    Tsuge, Satoru ; Shishibori, Masami ; Kuroiwa, Shingo ; Kita, Kenji

  • Author_Institution
    Fac. of Eng., Tokushima Univ., Japan
  • Volume
    2
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    960
  • Abstract
    The vector space model (VSM) is a conventional information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Additionally, the storage and processing of such matrices places great demands on computing resources. Dimensionality reduction is a way to overcome these problems. Principal component analysis (PCA) and singular value decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive and negative values in the decomposed matrices. In the work described here, we use non-negative matrix factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Also NMF computation is based on the simple iterative algorithm, it is therefore advantageous for applications involving large matrices. Using the MEDLINE collection, we experimentally showed that NMF offers great improvement over the vector space model
  • Keywords
    indexing; information retrieval; matrix decomposition; medical information systems; MEDLINE collection; additive vector combinations; dimensionality reduction; document collection; information retrieval model; iterative algorithm; matrix decomposition; nonnegative matrix factorization; parts-based representation; principal component analysis; singular value decomposition; term-by-document matrix; vector space model; Additives; Information retrieval; Information science; Intelligent systems; Iterative algorithms; Matrix decomposition; Principal component analysis; Sparse matrices; Systems engineering and theory; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics, 2001 IEEE International Conference on
  • Conference_Location
    Tucson, AZ
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7087-2
  • Type

    conf

  • DOI
    10.1109/ICSMC.2001.973042
  • Filename
    973042