• DocumentCode
    3301045
  • Title

    Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

  • Author

    Zhang, Chengzhi ; Song, Wei ; Li, Chenghua ; Yu, Wei

  • Author_Institution
    Dept. of Inf. Manage., Nanjing Univ. of Sci. & Technol., Nanjing
  • fYear
    2008
  • fDate
    19-22 Oct. 2008
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance.
  • Keywords
    genetic algorithms; ontologies (artificial intelligence); pattern clustering; text analysis; thesauri; corpus-based methods; latent semantic analysis; ontology-based text clustering; quantitative semantic similarity measures; self-adaptive genetic algorithm; thesaurus-based methods; vector space model; Algorithm design and analysis; Clustering algorithms; Extraterrestrial measurements; Genetic algorithms; Information management; Iterative algorithms; Ontologies; Partitioning algorithms; Taxonomy; Web sites; Clustering; genetic algorithm; latent semantic analysis; ontology; semantic similarity measure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4515-8
  • Electronic_ISBN
    978-1-4244-2780-2
  • Type

    conf

  • DOI
    10.1109/NLPKE.2008.4906791
  • Filename
    4906791