• DocumentCode
    20568
  • Title

    Multidimensional Latent Semantic Analysis Using Term Spatial Information

  • Author

    Haijun Zhang ; Ho, John K. L. ; Wu, Q. M. Jonathan ; Yunming Ye

  • Author_Institution
    Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China
  • Volume
    43
  • Issue
    6
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    1625
  • Lastpage
    1640
  • Abstract
    In this paper, we consider the problem of in-depth document analysis. In particular, we propose a novel document analysis method, named multidimensional latent semantic analysis (MDLSA), which enables us to mine local information efficiently from a document with respect to term associations and spatial distributions. MDLSA works by first partitioning each document into paragraphs and building a term affinity graph, which represents the frequency of term cooccurrence in a paragraph. We then conduct a 2-D principal component analysis to achieve an optimal semantic mapping. This analysis involves finding the leading eigenvectors of the sample covariance matrix of a training set to characterize the lower dimensional semantic space. A hybrid document similarity measure is designed to further improve the performance of this framework. Our algorithm is examined in two document applications: retrieval and classification. Experimental results demonstrate that the proposed technique outperforms current algorithms with respect to accuracy and computational efficiency.
  • Keywords
    covariance matrices; data mining; document handling; eigenvalues and eigenfunctions; information retrieval; natural language processing; pattern classification; principal component analysis; 2D principal component analysis; MDLSA; classification application; covariance matrix; eigenvector; hybrid document similarity measure; in-depth document analysis; local information mining; lower dimensional semantic space; multidimensional latent semantic analysis; optimal semantic mapping; retrieval application; spatial distributions; term affinity graph; term associations; term cooccurrence frequency; term spatial information; Covariance matrix; Feature extraction; Large scale integration; Principal component analysis; Semantics; Vectors; Vocabulary; Dimensionality reduction; multidimensional; principle component analysis (PCA); semantic analysis; term association;
  • fLanguage
    English
  • Journal_Title
    Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2267
  • Type

    jour

  • DOI
    10.1109/TSMCC.2012.2227112
  • Filename
    6416033