DocumentCode
2866095
Title
Text representation: from vector to tensor
Author
Liu, Ning ; Zhang, Benyu ; Yan, Jun ; Chen, Zheng ; Liu, Wenyin ; Bai, Fengshan ; Chien, Leefeng
Author_Institution
Dept. of Math. Sci., Tsinghua Univ., Beijing, China
fYear
2005
fDate
27-30 Nov. 2005
Abstract
In this paper, we propose a text representation model, Tensor Space Model (TSM), which models the text by multilinear algebraic high-order tensor instead of the traditional vector. Supported by techniques of multilinear algebra, TSM offers a potent mathematical framework for analyzing the multifactor structures. TSM is further supported by certain introduced particular operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for dimension reduction and other applications. Experimental results on the 20 Newsgroups dataset show that TSM is constantly better than VSM for text classification.
Keywords
singular value decomposition; tensors; text analysis; vectors; dimension reduction; high-order singular value decomposition; multifactor structures; multilinear algebraic high-order tensor; tensor space model; text representation; vector space model; Asia; Computer science; Data mining; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Principal component analysis; Singular value decomposition; Tensile stress;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, Fifth IEEE International Conference on
ISSN
1550-4786
Print_ISBN
0-7695-2278-5
Type
conf
DOI
10.1109/ICDM.2005.144
Filename
1565767
Link To Document