• Title of article

    Constraint-based discriminative dimension selection for high-dimensional stream clustering

  • Author/Authors

    Waiyamai , Kitsana Department of Computer Engineering - Kasetsart University - Bangkok , Thailand , Kangkachit, Thanapat College of Innovative Technology and Engineering - Dhurakij Pundit University - Bangkok, Thailand

  • Pages
    13
  • From page
    167
  • To page
    179
  • Abstract
    Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with large number of dimensions. In order to reduce the clustering complexity, one possible solution consists in determining the appropriate subset of cluster dimensions via dimension projection. SED-Stream is an efficient clustering algorithm that supports high dimension data streams. The aim of this paper is to increase performance of SED-Stream in terms of both clustering quality and execution-time. In order to improve the clustering process, background or domain expert knowledge are integrated as “constraints” in SEDC-Stream. The new algorithm, SEDC-Stream, supports the evolving characteristics of the dynamic constraints which are activation, fading, outdating and prioritization. SEDC-Stream algorithm is able to reduce cluster splitting time, and place new incoming points to their suitable clusters. Compared to SED-Stream on the three real-world streams datasets, SEDC-Stream is able to generate a better clustering performance in terms of both purity and f-measure.
  • Keywords
    Constraint-based clustering , Projected clustering , Dimension selection , High-dimensional data streams , Incremental stream clustering
  • Journal title
    International Journal of Advances in Intelligent Informatics
  • Serial Year
    2018
  • Record number

    2601101