Title :
Categorization using semi-supervised clustering
Author :
Hu, Jianying ; Singh, Moninder ; Mojsilovic, Aleksandra
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY
Abstract :
Many applications require matching objects to a predefined, yet highly dynamic set of categories accompanied by category descriptions. We present a novel approach to solving this class of categorization problems by formulating it in a semi-supervised clustering framework. Text-based matching is performed to generate ldquosoftrdquo seeds, which are then used to guide clustering in the basic feature space. We introduce a new variation of the k-means algorithm, called Soft Seeded k-means, which can effectively incorporate seeds that are of varying degrees of confidence, while allowing for incomplete coverage of the pre-defined categories. The algorithm is applied to real-world data from a business analytics application, and we demonstrate that it leads to superior performance compared to previous approaches.
Keywords :
learning (artificial intelligence); pattern clustering; text analysis; business analytics application; categorization; category descriptions; feature space; k-means algorithm; objects matching; semisupervised clustering; soft seeded k-means; text-based matching; Algorithm design and analysis; Clustering algorithms; Explosions; Image databases; Internet; Labeling; Performance analysis; Project management; Spatial databases; Taxonomy;
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
DOI :
10.1109/ICPR.2008.4761253