• DocumentCode
    2708804
  • Title

    Unsupervised Cross-Domain Learning by Interaction Information Co-clustering

  • Author

    Ando, Shin ; Suzuki, Einoshin

  • Author_Institution
    Grad. Sch. of Eng., Gunma Univ., Kiryu
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    13
  • Lastpage
    22
  • Abstract
    In real-world data mining applications, one often has access to multiple datasets that are relevant to the task at hand. However, learning from such datasets can be difficult as they are often drawn from different domains, i.e., not identically distributed or differ in class or feature sets. In this paper, we consider the problem of learning the class structures %, unique and shared, of related domains in an unsupervised manner. Its setting generalizes that of information filtering and novelty detection applications which addresses both known and unknown classes. We propose a co-clustering framework for estimating and adapting the class structures of two related domains, {enabling the analyses of shared and unique classes.} We define an objective function using interaction information to take account of the divergence between the corresponding clusters of respective domains. We present an iterative algorithm which alternates object and feature clustering and converges to a local minimum of the objective function. We present empirical results using text benchmarks, comparing the proposed algorithm and combinations of conventional approaches in problems of partitioning documents and detecting unknown topics.
  • Keywords
    data mining; information filtering; unsupervised learning; data mining; information filtering; interaction information coclustering; novelty detection; unsupervised cross-domain learning; Data mining; Minority Clustering; co-clustering; domain adaptation; information theoretic clustering; interactive information;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.92
  • Filename
    4781096