DocumentCode
2968578
Title
Mixed Group Discovery: Incorporating Group Linkage with Alternatively Consistent Social Network Analysis
Author
Huang, Shu
Author_Institution
Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., State College, PA, USA
fYear
2010
fDate
22-24 Sept. 2010
Firstpage
369
Lastpage
376
Abstract
Poor quality data exists widely in various database applications. In relational database, each entity is associated with a group of relational records. For two entities with similar identifiers, the records of one entity may be mistakenly combined into the group of the other. This is referred to as the mixed group problem. In this paper, we formulate the problem of discovering mixed groups and propose two unsupervised algorithms as solution. From the relational records, we observe that a group in one database is unlikely to be mixed in the same pattern as that of the same entity in another independent database. Also we find that the collaborative relationship between entities tend to be alternatively consistent over time. By investigating and applying these properties, we propose two mixed group discovery algorithms, as well as a generic model that covers various situations. Empirical experiments on both synthetic and real datasets from Citeseer and ACM digital libraries show that our algorithms can identify mixed groups with more than 70% precision and 80% recall, and the overall performance is significantly better than existing methods.
Keywords
digital libraries; relational databases; social networking (online); unsupervised learning; ACM digital libraries; Citeseer digital libraries; group linkage; independent database; mixed group discovery algorithms; relational database; social network analysis; unsupervised algorithms; Algorithm design and analysis; Clustering algorithms; Collaboration; Complexity theory; Couplings; Databases; Social network services;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
Conference_Location
Pittsburgh, PA
Print_ISBN
978-1-4244-7912-2
Electronic_ISBN
978-0-7695-4154-9
Type
conf
DOI
10.1109/ICSC.2010.26
Filename
5629119
Link To Document