• DocumentCode
    2136326
  • Title

    Web page clustering using Harmony Search optimization

  • Author

    Forsati, Rana ; Mahdavi, Mehrdad ; Kangavari, Mohammadreza ; Safarkhani, Banafsheh

  • Author_Institution
    Dept. of Comput. Eng., Islamic Azad Univ., Karaj
  • fYear
    2008
  • fDate
    4-7 May 2008
  • Abstract
    Clustering has become an increasingly important task in modern application domains. Targeting useful and relevant information on the World Wide Web is a topical and highly complicated research area. Clustering techniques have been applied to categorize documents on Web and extracting knowledge from the Web. In this paper we propose novel clustering algorithms based on harmony search (HS) optimization method that deals with Web document clustering. By modeling clustering as an optimization problem, first, we propose a pure HS based clustering algorithm that finds near global optimal clusters within a reasonable time. Then we hybridize K-means and harmony clustering to achieve better clustering. Experimental results on five different data sets reveal that the proposed algorithms can find better clusters when compared to similar methods and the quality of clusters is comparable. Also proposed algorithms converge to the best known optimum faster than other methods.
  • Keywords
    Web sites; document handling; information analysis; knowledge acquisition; Web document clustering; Web knowledge extraction; Web page clustering; World Wide Web; document categorization; harmony search optimization; Clustering algorithms; Clustering methods; Convergence; Data engineering; Data mining; Frequency; Genetic algorithms; Optimization methods; Partitioning algorithms; Web pages; clustering web pages; global optimization; harmony search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
  • Conference_Location
    Niagara Falls, ON
  • ISSN
    0840-7789
  • Print_ISBN
    978-1-4244-1642-4
  • Electronic_ISBN
    0840-7789
  • Type

    conf

  • DOI
    10.1109/CCECE.2008.4564812
  • Filename
    4564812