Title :
Text representation: from vector to tensor
Author :
Liu, Ning ; Zhang, Benyu ; Yan, Jun ; Chen, Zheng ; Liu, Wenyin ; Bai, Fengshan ; Chien, Leefeng
Author_Institution :
Dept. of Math. Sci., Tsinghua Univ., Beijing, China
Abstract :
In this paper, we propose a text representation model, Tensor Space Model (TSM), which models the text by multilinear algebraic high-order tensor instead of the traditional vector. Supported by techniques of multilinear algebra, TSM offers a potent mathematical framework for analyzing the multifactor structures. TSM is further supported by certain introduced particular operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for dimension reduction and other applications. Experimental results on the 20 Newsgroups dataset show that TSM is constantly better than VSM for text classification.
Keywords :
singular value decomposition; tensors; text analysis; vectors; dimension reduction; high-order singular value decomposition; multifactor structures; multilinear algebraic high-order tensor; tensor space model; text representation; vector space model; Asia; Computer science; Data mining; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Principal component analysis; Singular value decomposition; Tensile stress;
Conference_Titel :
Data Mining, Fifth IEEE International Conference on
Print_ISBN :
0-7695-2278-5
DOI :
10.1109/ICDM.2005.144