• DocumentCode
    166031
  • Title

    HFRECCA for clustering of text data from travel guide articles

  • Author

    Wazarkar, Seema V. ; Manjrekar, Amrita A.

  • Author_Institution
    Dept. of Technol., Shivaji Univ., Kolhapur, India
  • fYear
    2014
  • fDate
    24-27 Sept. 2014
  • Firstpage
    1486
  • Lastpage
    1489
  • Abstract
    Text clustering is advantageous for extraction of text data from web applications such as e-news papers, collection of research papers, blogs, news feeds at social networks, etc. This paper presents a text clustering Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA). The algorithm is a combination of fuzzy clustering, divisive hierarchical clustering and page rank algorithm. Travel guide articles are pre-processed to remove stop words and stemming. Then, similarity matrix is generated using word distance computation. In HFRECCA, divisive hierarchical clustering algorithm is applied where it uses Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (FRECCA) as sub routine algorithm. FRECCA outputs cluster membership values on the basis of page rank score using page rank algorithm and generate clusters according to it. HFRECCA has features of hierarchical clustering as well as fuzzy clustering as it creates hierarchy of clusters and an object can belong to multiple clusters. Structure of information resides in text documents is hierarchical hence HFRECCA is useful for clustering of data from natural language documents.
  • Keywords
    eigenvalues and eigenfunctions; natural language processing; text analysis; text detection; Web applications; e-news papers; fuzzy clustering; hierarchical clustering algorithm; hierarchical fuzzy relational eigenvector centrality-based clustering algorithm; natural language documents; page rank algorithm; social networks; sub routine algorithm; text clustering; text data extraction; travel guide articles; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data mining; Data models; Partitioning algorithms; Semantics; Fuzzy clustering; Hierarchical clustering; Similarity Measure; Text Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
  • Conference_Location
    New Delhi
  • Print_ISBN
    978-1-4799-3078-4
  • Type

    conf

  • DOI
    10.1109/ICACCI.2014.6968349
  • Filename
    6968349