• DocumentCode
    751073
  • Title

    Dual clustering: integrating data clustering over optimization and constraint domains

  • Author

    Lin, Cheng-Ru ; Liu, Ken-Hao ; Chen, Ming-Syan

  • Author_Institution
    Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    17
  • Issue
    5
  • fYear
    2005
  • fDate
    5/1/2005 12:00:00 AM
  • Firstpage
    628
  • Lastpage
    637
  • Abstract
    Spatial clustering has attracted a lot of research attention due to its various applications. In most conventional clustering problems, the similarity measurement mainly takes the geometric attributes into consideration. However, in many real applications, the nongeometric attributes are what users are concerned about. In the conventional spatial clustering, the input data set is partitioned into several compact regions and data points which are similar to one another in their nongeometric attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. To remedy this, we propose and explore in this paper a new clustering problem on two domains, called dual clustering, where one domain refers to the optimization domain and the other refers to the constraint domain. Attributes on the optimization domain are those involved in the optimization of the objective function, while those on the constraint domain specify the application dependent constraints. Our goal is to optimize the objective function in the optimization domain while satisfying the constraint specified in the constraint domain. We devise an efficient and effective algorithm, named Interlaced Clustering-Classification, abbreviated as ICC, to solve this problem. The proposed ICC algorithm combines the information in both domains and iteratively performs a clustering algorithm on the optimization domain and also a classification algorithm on the constraint domain to reach the target clustering effectively. The time and space complexities of the ICC algorithm are formally analyzed. Several experiments are conducted to provide the insights into the dual clustering problem and the proposed algorithm.
  • Keywords
    computational complexity; data mining; optimisation; pattern clustering; spatial reasoning; visual databases; Interlaced Clustering-Classification algorithm; constraint domain; dual clustering; nongeometric attribute; optimization domain; spatial data clustering; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Constraint optimization; Data mining; Iterative algorithms; Partitioning algorithms; Pattern analysis; Pattern recognition; Scattering; Index Terms- Data mining; data clustering; dual clustering.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.75
  • Filename
    1411742