Title :
Text document clustering with distributed noun with its compactness using relevance measure and Heuristic function
Author :
Vijayalakshmi, S. ; Manimegalai, D.
Author_Institution :
Dept. of Comput. Sci., Bharathiyar Univ., Coimbatore, India
Abstract :
This paper presents and discusses some ensemble attribute selection methods for text document clustering. In this research, attributes are extracted in two levels. The first level document representation based on distribution of compact noun were constructed and relevance measure is applied on the distributed compact noun-document representation. it is used evaluate the importance of the noun. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. Vector Space Model has been used in many text mining tasks, where it has achieved good results as well as acceptable computational complexity. So the proposed document representations are exploited the nf-idf style equation. In this work, distributional nouns are selected into three different ways. First method, the distributed nouns are integrated with relevance measures, and secondly, relevance based distributional nouns are incorporated with Heuristic function to find importance of attributes. The distributional noun representations are used to discriminate the Nouns and measure the semantic similarity between documents. This proposed feature selection method have been successfully applied for Text clustering with flocking algorithm, HCLK Means Clustering and evaluated the efficiency of our Document representation with both synthetic and a real datasets. It is found that the proposed algorithm identifies better feature sets and improves the clustering quality.
Keywords :
learning (artificial intelligence); pattern clustering; text analysis; HCLK means clustering; distributed noun; document representation; feature selection method; flocking algorithm; heuristic function; relevance measure; supervised learning; text clustering; text document clustering; vector space model; Classification algorithms; Clustering algorithms; Pattern matching; Distributed Noun attributes; Flocking Algorithm; HCLK Mean; Heuristic function; Relevance Measure; RiTa WordNet;
Conference_Titel :
Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-6817-6
DOI :
10.1109/ICIIECS.2015.7193204