• DocumentCode
    2219432
  • Title

    A hyper-heuristic approach to design and tuning heuristic methods for web document clustering

  • Author

    Cobos, Carlos ; Mendoza, Martha ; León, Elizabeth

  • Author_Institution
    Comput. Sci. Dept., Univ. del Cauca, Popayan, Colombia
  • fYear
    2011
  • fDate
    5-8 June 2011
  • Firstpage
    1350
  • Lastpage
    1358
  • Abstract
    This paper introduces a new description-centric algorithm for web document clustering called HHWDC. The HHWDC algorithm has been designed from a hyper-heuristic approach and allows defining the best algorithm for web document clustering. HHWDC uses as heuristic selection methodology two options, namely: random selection and roulette wheel selection based on performance of low-level heuristics (harmony search, an improved harmony search, a novel global harmony search, global-best harmony search, restrictive mating, roulette wheel selection, and particle swarm optimization). HHWDC uses the k-means algorithm for local solution improvement strategy, and based on the Bayesian Information Criteria is able to automatically define the number of clusters. HHWDC uses two acceptance/replace strategies, namely: Replace the worst and Restricted Competition Replacement. HHWDC was tested with data sets based on Reuters-21578 and DMOZ, obtaining promising results (better precision results than a Singular Value Decomposition algorithm).
  • Keywords
    Bayes methods; Internet; document handling; particle swarm optimisation; pattern clustering; Bayesian information criteria; DMOZ; HHWDC algorithm; Web document clustering; data set; description-centric algorithm; heuristic method design; heuristic method tuning; hyper-heuristic approach; k-means algorithm; local solution improvement strategy; restricted competition replacement; Algorithm design and analysis; Bandwidth; Clustering algorithms; Heuristic algorithms; Particle swarm optimization; Partitioning algorithms; Time division multiplexing; genetic algorithm; harmony search; hyper-heuristic; memetic algorithm; particle swarm; web document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2011 IEEE Congress on
  • Conference_Location
    New Orleans, LA
  • ISSN
    Pending
  • Print_ISBN
    978-1-4244-7834-7
  • Type

    conf

  • DOI
    10.1109/CEC.2011.5949773
  • Filename
    5949773