• DocumentCode
    2036926
  • Title

    Dimensionality Reduction in Webpage Categorization Using Probabilistic Latent Semantic Analysis and Adaptive General Particle Swarm Optimization

  • Author

    Tong Yala ; Wang Chunzhi

  • Author_Institution
    Sch. of Sci., Hubei Univ. of Technol., Wuhan
  • fYear
    2009
  • fDate
    23-24 May 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    A new method of text dimension reduction is brought forward based on probabilistic latent semantic analysis (PLSA) and adaptive general particle swarm optimization (AGPSO). PLSA is used to specify essential associative semantic relationships instead of the original document space. The dimension can be reduced greatly by Expectation Maximization algorithm. A crossover operator is designed to simulate the flying velocity alteration and a mutation operator is used to keep the population diversity. Besides these, an adaptive strategy is introduced to adjust probability of crossover and mutation just in order to obtain optimal feature set. The experimental results indicate that the algorithm can not only reduce dimension, but also improve categorization precision.
  • Keywords
    Internet; data reduction; expectation-maximisation algorithm; mathematical operators; particle swarm optimisation; probability; text analysis; Webpage categorization; adaptive general particle swarm optimization; associative semantic relationship; crossover operator; expectation maximization algorithm; flying velocity alteration; probabilistic latent semantic analysis; text dimensionality reduction; Ant colony optimization; Computer science; Data mining; Evolutionary computation; Feature extraction; Functional analysis; Genetic mutations; Particle swarm optimization; Space technology; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems and Applications, 2009. ISA 2009. International Workshop on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-3893-8
  • Electronic_ISBN
    978-1-4244-3894-5
  • Type

    conf

  • DOI
    10.1109/IWISA.2009.5072835
  • Filename
    5072835