Title :
Study on Sub Topic Clustering of Multi-Documents Based on Semi-Supervised Learning
Author_Institution :
Math., Phys. & Inf. Eng., Zhejiang Normal Univ., Jinhua, China
Abstract :
Sub-topic detecting is an important step in the abstracting of multi-documents.This paper describes a new method for sub-topic detecting based on semi-supervised learning:it firstly gets the primal sets of topics by hierarchy clustering,and labels the sentences which have high scores in the topics,then use the method of constrained-kMeans to decide the number of topics(k),and finally get the topic sets by k-Means clustering.The experiment result indicates that its value is stable.
Keywords :
document handling; learning (artificial intelligence); pattern clustering; hierarchy clustering; k-means clustering; multidocument handling; semisupervised learning; subtopic clustering; subtopic detection; Clustering algorithms; Clustering methods; Dictionaries; Monitoring; Semantics; Web pages;
Conference_Titel :
Database Technology and Applications (DBTA), 2010 2nd International Workshop on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6975-8
Electronic_ISBN :
978-1-4244-6977-2
DOI :
10.1109/DBTA.2010.5659111