Title :
Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances
Author :
Christoffel du Plessis, Marthinus ; Gang Niu ; Sugiyama, Masakazu
Author_Institution :
Dept. of Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
Abstract :
We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying class. In this paper, we first show that this unsupervised labeling problem in balanced binary cases can be solved if two unlabeled datasets having different class balances are available. More specifically, estimation of the sign of the difference between probability densities of two unlabeled datasets gives the solution. We then introduce a new method to directly estimate the sign of the density difference without density estimation. Finally, we demonstrate the usefulness of the proposed method against several clustering methods on various toy problems and real-world datasets.
Keywords :
pattern clustering; probability; unsupervised learning; balanced binary cases; class balances; probability densities; unlabeled data clustering method; unlabeled datasets; unsupervised labeling problem; unsupervised learning problem; Benchmark testing; Clustering methods; Estimation; Kernel; Labeling; Linear programming; Support vector machines; class-balance change; clustering;
Conference_Titel :
Technologies and Applications of Artificial Intelligence (TAAI), 2013 Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4799-2528-5
DOI :
10.1109/TAAI.2013.15