• DocumentCode
    1727560
  • Title

    Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances

  • Author

    Christoffel du Plessis, Marthinus ; Gang Niu ; Sugiyama, Masakazu

  • Author_Institution
    Dept. of Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
  • fYear
    2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying class. In this paper, we first show that this unsupervised labeling problem in balanced binary cases can be solved if two unlabeled datasets having different class balances are available. More specifically, estimation of the sign of the difference between probability densities of two unlabeled datasets gives the solution. We then introduce a new method to directly estimate the sign of the density difference without density estimation. Finally, we demonstrate the usefulness of the proposed method against several clustering methods on various toy problems and real-world datasets.
  • Keywords
    pattern clustering; probability; unsupervised learning; balanced binary cases; class balances; probability densities; unlabeled data clustering method; unlabeled datasets; unsupervised labeling problem; unsupervised learning problem; Benchmark testing; Clustering methods; Estimation; Kernel; Labeling; Linear programming; Support vector machines; class-balance change; clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Technologies and Applications of Artificial Intelligence (TAAI), 2013 Conference on
  • Conference_Location
    Taipei
  • Print_ISBN
    978-1-4799-2528-5
  • Type

    conf

  • DOI
    10.1109/TAAI.2013.15
  • Filename
    6783834