DocumentCode
166031
Title
HFRECCA for clustering of text data from travel guide articles
Author
Wazarkar, Seema V. ; Manjrekar, Amrita A.
Author_Institution
Dept. of Technol., Shivaji Univ., Kolhapur, India
fYear
2014
fDate
24-27 Sept. 2014
Firstpage
1486
Lastpage
1489
Abstract
Text clustering is advantageous for extraction of text data from web applications such as e-news papers, collection of research papers, blogs, news feeds at social networks, etc. This paper presents a text clustering Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA). The algorithm is a combination of fuzzy clustering, divisive hierarchical clustering and page rank algorithm. Travel guide articles are pre-processed to remove stop words and stemming. Then, similarity matrix is generated using word distance computation. In HFRECCA, divisive hierarchical clustering algorithm is applied where it uses Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (FRECCA) as sub routine algorithm. FRECCA outputs cluster membership values on the basis of page rank score using page rank algorithm and generate clusters according to it. HFRECCA has features of hierarchical clustering as well as fuzzy clustering as it creates hierarchy of clusters and an object can belong to multiple clusters. Structure of information resides in text documents is hierarchical hence HFRECCA is useful for clustering of data from natural language documents.
Keywords
eigenvalues and eigenfunctions; natural language processing; text analysis; text detection; Web applications; e-news papers; fuzzy clustering; hierarchical clustering algorithm; hierarchical fuzzy relational eigenvector centrality-based clustering algorithm; natural language documents; page rank algorithm; social networks; sub routine algorithm; text clustering; text data extraction; travel guide articles; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data mining; Data models; Partitioning algorithms; Semantics; Fuzzy clustering; Hierarchical clustering; Similarity Measure; Text Clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location
New Delhi
Print_ISBN
978-1-4799-3078-4
Type
conf
DOI
10.1109/ICACCI.2014.6968349
Filename
6968349
Link To Document