• DocumentCode
    2577952
  • Title

    Data clustering with modified K-means algorithm

  • Author

    Singh, Ran Vijay ; Bhatia, M.P.S.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Delhi, New Delhi, India
  • fYear
    2011
  • fDate
    3-5 June 2011
  • Firstpage
    717
  • Lastpage
    721
  • Abstract
    This paper presents a data clustering approach using modified K-Means algorithm based on the improvement of the sensitivity of initial center (seed point) of clusters. This algorithm partitions the whole space into different segments and calculates the frequency of data point in each segment. The segment which shows maximum frequency of data point will have the maximum probability to contain the centroid of cluster. The number of cluster´s centroid (k) will be provided by the user in the same manner like the traditional K-mean algorithm and the number of division will be k*k (`k´ vertically as well as `k´ horizontally). If the highest frequency of data point is same in different segments and the upper bound of segment crosses the threshold `k´ then merging of different segments become mandatory and then take the highest k segment for calculating the initial centroid (seed point) of clusters. In this paper we also define a threshold distance for each cluster´s centroid to compare the distance between data point and cluster´s centroid with this threshold distance through which we can minimize the computational effort during calculation of distance between data point and cluster´s centroid. It is shown that how the modified k-mean algorithm will decrease the complexity & the effort of numerical calculation, maintaining the easiness of implementing the k-mean algorithm. It assigns the data point to their appropriate class or cluster more effectively.
  • Keywords
    data mining; pattern clustering; probability; cluster centroid; computational effort; data clustering; data point frequency; modified K-mean algorithm; numerical calculation; probability; threshold distance; Algorithm design and analysis; Clustering algorithms; Data mining; Equations; Machine learning algorithms; Mathematical model; Partitioning algorithms; Data Clustering; K-Means;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends in Information Technology (ICRTIT), 2011 International Conference on
  • Conference_Location
    Chennai, Tamil Nadu
  • Print_ISBN
    978-1-4577-0588-5
  • Type

    conf

  • DOI
    10.1109/ICRTIT.2011.5972376
  • Filename
    5972376