• DocumentCode
    1582273
  • Title

    Text document clustering with distributed noun with its compactness using relevance measure and Heuristic function

  • Author

    Vijayalakshmi, S. ; Manimegalai, D.

  • Author_Institution
    Dept. of Comput. Sci., Bharathiyar Univ., Coimbatore, India
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper presents and discusses some ensemble attribute selection methods for text document clustering. In this research, attributes are extracted in two levels. The first level document representation based on distribution of compact noun were constructed and relevance measure is applied on the distributed compact noun-document representation. it is used evaluate the importance of the noun. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. Vector Space Model has been used in many text mining tasks, where it has achieved good results as well as acceptable computational complexity. So the proposed document representations are exploited the nf-idf style equation. In this work, distributional nouns are selected into three different ways. First method, the distributed nouns are integrated with relevance measures, and secondly, relevance based distributional nouns are incorporated with Heuristic function to find importance of attributes. The distributional noun representations are used to discriminate the Nouns and measure the semantic similarity between documents. This proposed feature selection method have been successfully applied for Text clustering with flocking algorithm, HCLK Means Clustering and evaluated the efficiency of our Document representation with both synthetic and a real datasets. It is found that the proposed algorithm identifies better feature sets and improves the clustering quality.
  • Keywords
    learning (artificial intelligence); pattern clustering; text analysis; HCLK means clustering; distributed noun; document representation; feature selection method; flocking algorithm; heuristic function; relevance measure; supervised learning; text clustering; text document clustering; vector space model; Classification algorithms; Clustering algorithms; Pattern matching; Distributed Noun attributes; Flocking Algorithm; HCLK Mean; Heuristic function; Relevance Measure; RiTa WordNet;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on
  • Conference_Location
    Coimbatore
  • Print_ISBN
    978-1-4799-6817-6
  • Type

    conf

  • DOI
    10.1109/ICIIECS.2015.7193204
  • Filename
    7193204