مرکز منطقه ای اطلاع رساني علوم و فناوري - What and When Can We Gain From the Kernel Versions of C-Means Algorithm?

Abstract :

In the recent past, different kernelized versions of c-means (hard and fuzzy) clustering algorithms have been proposed. Here, we focus on kernel clustering of only object data, X = { x₁, ...,xn } ⊂ R^p. We first raise a basic question: Should we really cluster any given object data in the kernel space? The answer is NO! Here are our line of arguments: 1) The objective of any clustering algorithm is to find natural subgroups in X, where the subgroups are defined by a measure of similarity between the vectors in X. 2) If we transform the data X into Y in another space by a nonlinear transformation and try to find clusters in Y, then such clusters can be useful if and only if Y helps us to find the same clusters that are present in X because that is our objective. 3) If Y maintains the same structure/topology as that of X, then the use of Y may not give any advantage. 4) On the other hand, if Y changes the structure (i.e., imposes a new structure) on the data and that change makes the extraction of the desired clusters present in X easier, then clustering of Y is useful. 5) But when Y imposes new (nonexistent) structures, the clustering algorithm may find very strange clusters with no relation to the actual clusters present in X. 6) Thus, when we try to cluster in a transformed space, the issue is to know if it could help us to find the clusters present in X. To get any benefit from kernel clustering (or clustering in any other transformed space), we need to answer this question first; otherwise, we may find completely irrelevant clusters without knowing it and thereby making kernel clustering useless. 7) This issue is a philosophical one and is neither dependent on the choice of clustering algorithm nor on the particular transformation (kernel function) used. 8) Except for 2-D/3-D data, we do not know of any way to answer the question in 6) and for 2-D/3-D data, since we can look at the data, we do not need kernel clustering. Therefore, t- ere is no benefit from kernel clustering. We demonstrate and justify our claims using both synthetic and real datasets with visual assessment as well as with normalized mutual information, adjusted Rand index, and cluster instability. We propose to use Sammon´s nonlinear projection method to get a crude visual representation of the data in the kernel space. We discuss the issue of how to choose appropriate parameters of the kernel function, but we could not provide a solution to this problem. Finally, we discuss how the kernel parameters and the algorithmic parameters interact.