Title :
HFRECCA for clustering of text data from travel guide articles
Author :
Wazarkar, Seema V. ; Manjrekar, Amrita A.
Author_Institution :
Dept. of Technol., Shivaji Univ., Kolhapur, India
Abstract :
Text clustering is advantageous for extraction of text data from web applications such as e-news papers, collection of research papers, blogs, news feeds at social networks, etc. This paper presents a text clustering Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA). The algorithm is a combination of fuzzy clustering, divisive hierarchical clustering and page rank algorithm. Travel guide articles are pre-processed to remove stop words and stemming. Then, similarity matrix is generated using word distance computation. In HFRECCA, divisive hierarchical clustering algorithm is applied where it uses Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (FRECCA) as sub routine algorithm. FRECCA outputs cluster membership values on the basis of page rank score using page rank algorithm and generate clusters according to it. HFRECCA has features of hierarchical clustering as well as fuzzy clustering as it creates hierarchy of clusters and an object can belong to multiple clusters. Structure of information resides in text documents is hierarchical hence HFRECCA is useful for clustering of data from natural language documents.
Keywords :
eigenvalues and eigenfunctions; natural language processing; text analysis; text detection; Web applications; e-news papers; fuzzy clustering; hierarchical clustering algorithm; hierarchical fuzzy relational eigenvector centrality-based clustering algorithm; natural language documents; page rank algorithm; social networks; sub routine algorithm; text clustering; text data extraction; travel guide articles; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data mining; Data models; Partitioning algorithms; Semantics; Fuzzy clustering; Hierarchical clustering; Similarity Measure; Text Clustering;
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-3078-4
DOI :
10.1109/ICACCI.2014.6968349