Title :
The key technology of topic detection based on K-means
Author :
Li, Shengdong ; Lv, Xueqiang ; Wang, Tao ; Shi, Shuicai
Author_Institution :
Chinese Inf. Process. Res. Center, Beijing Inf. Sci. & Technol. Univ., Beijing, China
Abstract :
Text clustering is the key technology for topic detection, and topic detection is essentially similar to the unsupervised clustering. However, general clustering is based on global information, and clustering in the topic detection is based on incremental ways. So we should study topic detection according to clustering algorithm, and it is necessary for clustering algorithm to be in-depth and extensive research. Vector space model (VSM) is one of the most simple and effective topics representation model. And K-means is a well-known and widely used partitional clustering method. Therefore, we develop a topic detection prototype system to study how K in K-means affects topic detection. Then we get the variation law that it affects topic detection, and add up their optimal values in topic detection. Finally, TDT evaluation methods prove that the validity of the value of K in the algorithm is 83.33% in the topic detection prototype system based on K-means. This shows that K-means clustering algorithm is suited to deal with topic detection.
Keywords :
Internet; pattern clustering; text analysis; Internet; K-means clustering algorithm; TDT evaluation methods; key technology; text clustering; topic detection prototype system; topics representation model; unsupervised clustering; vector space model; Art; Automobiles; Computers; Education; Finance; k-means; tdt evaluation; topic detection; vsm;
Conference_Titel :
Future Information Technology and Management Engineering (FITME), 2010 International Conference on
Conference_Location :
Changzhou
Print_ISBN :
978-1-4244-9087-5
DOI :
10.1109/FITME.2010.5656255