• DocumentCode
    2968578
  • Title

    Mixed Group Discovery: Incorporating Group Linkage with Alternatively Consistent Social Network Analysis

  • Author

    Huang, Shu

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., State College, PA, USA
  • fYear
    2010
  • fDate
    22-24 Sept. 2010
  • Firstpage
    369
  • Lastpage
    376
  • Abstract
    Poor quality data exists widely in various database applications. In relational database, each entity is associated with a group of relational records. For two entities with similar identifiers, the records of one entity may be mistakenly combined into the group of the other. This is referred to as the mixed group problem. In this paper, we formulate the problem of discovering mixed groups and propose two unsupervised algorithms as solution. From the relational records, we observe that a group in one database is unlikely to be mixed in the same pattern as that of the same entity in another independent database. Also we find that the collaborative relationship between entities tend to be alternatively consistent over time. By investigating and applying these properties, we propose two mixed group discovery algorithms, as well as a generic model that covers various situations. Empirical experiments on both synthetic and real datasets from Citeseer and ACM digital libraries show that our algorithms can identify mixed groups with more than 70% precision and 80% recall, and the overall performance is significantly better than existing methods.
  • Keywords
    digital libraries; relational databases; social networking (online); unsupervised learning; ACM digital libraries; Citeseer digital libraries; group linkage; independent database; mixed group discovery algorithms; relational database; social network analysis; unsupervised algorithms; Algorithm design and analysis; Clustering algorithms; Collaboration; Complexity theory; Couplings; Databases; Social network services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
  • Conference_Location
    Pittsburgh, PA
  • Print_ISBN
    978-1-4244-7912-2
  • Electronic_ISBN
    978-0-7695-4154-9
  • Type

    conf

  • DOI
    10.1109/ICSC.2010.26
  • Filename
    5629119