• DocumentCode
    1797707
  • Title

    Classifying web documents using term spectral transforms and Multi-Dimensional Latent Semantic representation

  • Author

    Haijun Zhang ; Shifu Bie ; Bin Luo

  • Author_Institution
    Dept. of Comput. Sci., Harbin Inst. of Technol., Shenzhen, China
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    1320
  • Lastpage
    1327
  • Abstract
    This research investigates the potential of document semantic representation considering both term frequencies and term associations. In particular, we proposed a general framework of the use of term spectra to represent term spatial distributions and associations through a document. The term spectra we explored involved the use of three typical techniques: Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and Discrete Wavelet Transform (DWT). A term affinity graph was established to represent each document. We then employed a new document analysis method (recently developed by authors), named Multi-Dimensional Latent Semantic Analysis (MDLSA), which enables us to formulate an efficient semantic representation of a document based on the term affinity graph. Our algorithm was examined in the application of Web document classification. Experimental results demonstrate that the proposed technique not only gains much computational efficiency compared to Direct Graph Matching (DGM), but also outperforms the state-of-art algorithms such as VSM, PCA, RAP, and MLM.
  • Keywords
    Internet; discrete Fourier transforms; discrete cosine transforms; discrete wavelet transforms; document handling; graph theory; natural language processing; pattern classification; DCT; DFT; DGM; DWT; Web document classification; direct graph matching; discrete Fourier transform; discrete cosine transform; discrete wavelet transform; document analysis method; document semantic representation; multidimensional latent semantic analysis; multidimensional latent semantic representation; term affinity graph; term associations; term frequencies; term spatial distributions; term spectra; term spectral transforms; Accuracy; Discrete Fourier transforms; Discrete wavelet transforms; Principal component analysis; Semantics; Vectors; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889582
  • Filename
    6889582