DocumentCode :
243759
Title :
Semi-Supervised Consensus Clustering: Reducing Human Effort
Author :
Vogel, Tobias ; Naumann, Felix
Author_Institution :
Hasso Plattner Inst., Potsdam, Germany
fYear :
2014
fDate :
14-14 Dec. 2014
Firstpage :
1095
Lastpage :
1104
Abstract :
Machine-based clustering yields fuzzy results. For example, when detecting duplicates in a dataset, different tools might end up with different clusterings. Eventually, a decision needs to be made, defining which records are in the same cluster, i.e., are duplicates. Such a definitive result is called a Consensus Clustering and can be created by evaluating the clustering attempts against each other and only resolving the disagreements by human experts. Yet, there can be different consensus clusterings, depending on the choice of disagreements presented to the human expert. In particular, they may require a different number of manual inspections. We present a set of strategies to select the smallest set of manual inspections to arrive at a consensus clustering and evaluate their efficiency on a set of real-world and synthetic datasets.
Keywords :
fuzzy set theory; pattern clustering; decision making; duplicate detection; fuzzy results; human effort reduction; machine-based clustering; manual inspections; real-world dataset; semisupervised consensus clustering; synthetic dataset; Clustering algorithms; Conferences; Inspection; Joints; Manuals; Merging; Standards; consensus clustering; duplicate detection; semi-supervision;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
Type :
conf
DOI :
10.1109/ICDMW.2014.97
Filename :
7022718
Link To Document :
بازگشت