DocumentCode
1797707
Title
Classifying web documents using term spectral transforms and Multi-Dimensional Latent Semantic representation
Author
Haijun Zhang ; Shifu Bie ; Bin Luo
Author_Institution
Dept. of Comput. Sci., Harbin Inst. of Technol., Shenzhen, China
fYear
2014
fDate
6-11 July 2014
Firstpage
1320
Lastpage
1327
Abstract
This research investigates the potential of document semantic representation considering both term frequencies and term associations. In particular, we proposed a general framework of the use of term spectra to represent term spatial distributions and associations through a document. The term spectra we explored involved the use of three typical techniques: Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and Discrete Wavelet Transform (DWT). A term affinity graph was established to represent each document. We then employed a new document analysis method (recently developed by authors), named Multi-Dimensional Latent Semantic Analysis (MDLSA), which enables us to formulate an efficient semantic representation of a document based on the term affinity graph. Our algorithm was examined in the application of Web document classification. Experimental results demonstrate that the proposed technique not only gains much computational efficiency compared to Direct Graph Matching (DGM), but also outperforms the state-of-art algorithms such as VSM, PCA, RAP, and MLM.
Keywords
Internet; discrete Fourier transforms; discrete cosine transforms; discrete wavelet transforms; document handling; graph theory; natural language processing; pattern classification; DCT; DFT; DGM; DWT; Web document classification; direct graph matching; discrete Fourier transform; discrete cosine transform; discrete wavelet transform; document analysis method; document semantic representation; multidimensional latent semantic analysis; multidimensional latent semantic representation; term affinity graph; term associations; term frequencies; term spatial distributions; term spectra; term spectral transforms; Accuracy; Discrete Fourier transforms; Discrete wavelet transforms; Principal component analysis; Semantics; Vectors; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4799-6627-1
Type
conf
DOI
10.1109/IJCNN.2014.6889582
Filename
6889582
Link To Document