• DocumentCode
    128708
  • Title

    Clustering based semantic data summarization technique: A new approach

  • Author

    Ahmed, Mariwan ; Mahmood, Abdun Naser

  • Author_Institution
    Sch. of Eng. & Inf. Technol., Univ. of New South Wales, Canberra, ACT, Australia
  • fYear
    2014
  • fDate
    9-11 June 2014
  • Firstpage
    1780
  • Lastpage
    1785
  • Abstract
    Due to advancement of computing and proliferation of data repositories, efficient data mining techniques are required to extract meaningful information. Summarization is such an important data analysis technique which can be broadly classified into two categories as semantic and syntactic methods. Syntactic methods consider a dataset as a sequence of bytes whereas semantic methods convert large dataset into a much smaller one yet maintaining low information loss. Clustering algorithms are widely used for semantic summarization such as basic k-means. Existing clustering based summarization techniques assume that a summary is represented using the cluster centroids. However, the centroids might not represent the actual data points in summary. In addition, many clustering algorithms, such as the most popular k-means algorithm requires the number of clusters as an input, which is not available for unsupervised summarization of unlabeled data. To address these issues, we propose a clustering based semantic summarization using a combination of x-means and k-medoid clustering algorithms. Our experimental analysis shows that, the proposed algorithm outperforms k-means based summarization techniques.
  • Keywords
    data analysis; data mining; information retrieval; pattern clustering; bytes sequence; cluster centroids; clustering based semantic data summarization technique; data analysis technique; data mining techniques; data repositories; information extraction; information loss; k-means algorithm; k-medoid clustering algorithms; large dataset; semantic methods; syntactic methods; x-means clustering algorithms; Conferences; Decision support systems; Industrial electronics; Clustering; Data Summarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics and Applications (ICIEA), 2014 IEEE 9th Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4799-4316-6
  • Type

    conf

  • DOI
    10.1109/ICIEA.2014.6931456
  • Filename
    6931456