• DocumentCode
    583172
  • Title

    A Genetic Niching Algorithm with Self-Adaptating Operator Rates for Document Clustering

  • Author

    Leon, Errol ; Gomez, Jose ; Nasraoui, Olfa

  • fYear
    2012
  • fDate
    25-27 Oct. 2012
  • Firstpage
    79
  • Lastpage
    86
  • Abstract
    We propose a Genetic algorithm for document clustering, where an evolutionary multimodal optimization algorithm evolves candidate cluster representative solutions to search for dense regions in the sparse high dimensional vector space of text documents. The evolution affects not only the document cluster representatives but also the genetic operator rates which are evolved simultaneously with the document cluster representative solutions. The evolving population consists of candidate document cluster representatives that are encoded in the form of a sparse index and sparse index/frequency variable length vectors. In addition, specialized sparse genetic operators are defined for this special representation. The proposed specialized genetic operators achieve different degrees of exploitation and exploration in searching for the optimal document cluster prototypes, in particular the most specialized operator for the document clustering problem is the Sparse Top-K-Addition operator, which can be seen as an incentive towards a more aggressive exploitation of the local context in a small subset of documents, whereas the simple Sparse Real Addition operator works more in an exploratory manner. As shown in our experiments on two well-known document data sets, taking into account associated terms within a local context adds the benefit of an explicit latent semantic consideration in the search for optimal term lists to describe the cluster prototypes.
  • Keywords
    genetic algorithms; pattern clustering; text analysis; document clustering; evolutionary multimodal optimization algorithm; explicit latent semantic; frequency variable length vector; genetic niching algorithm; optimal document cluster prototype; self-adaptating operator rates; sparse index; sparse top-K-addition operator; specialized genetic operator; specialized sparse genetic operator; text document; Clustering algorithms; Frequency measurement; Genetics; Indexes; Mathematical model; Prototypes; Vectors; Genetic Clustering; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Congress (LA-WEB), 2012 Eighth Latin American
  • Conference_Location
    Cartagena de Indias
  • Print_ISBN
    978-1-4673-4473-9
  • Type

    conf

  • DOI
    10.1109/LA-WEB.2012.22
  • Filename
    6392142