• DocumentCode
    2866095
  • Title

    Text representation: from vector to tensor

  • Author

    Liu, Ning ; Zhang, Benyu ; Yan, Jun ; Chen, Zheng ; Liu, Wenyin ; Bai, Fengshan ; Chien, Leefeng

  • Author_Institution
    Dept. of Math. Sci., Tsinghua Univ., Beijing, China
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    In this paper, we propose a text representation model, Tensor Space Model (TSM), which models the text by multilinear algebraic high-order tensor instead of the traditional vector. Supported by techniques of multilinear algebra, TSM offers a potent mathematical framework for analyzing the multifactor structures. TSM is further supported by certain introduced particular operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for dimension reduction and other applications. Experimental results on the 20 Newsgroups dataset show that TSM is constantly better than VSM for text classification.
  • Keywords
    singular value decomposition; tensors; text analysis; vectors; dimension reduction; high-order singular value decomposition; multifactor structures; multilinear algebraic high-order tensor; tensor space model; text representation; vector space model; Asia; Computer science; Data mining; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Principal component analysis; Singular value decomposition; Tensile stress;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.144
  • Filename
    1565767