DocumentCode :
1866320
Title :
Topic Extraction for a Large Document Set with the Topic Integration
Author :
Yokoi, Takeru ; Yanagimoto, Hidekazu
Author_Institution :
Tokyo Metropolitan Coll. of Ind. Technol., Tokyo, Japan
fYear :
2010
fDate :
9-10 Jan. 2010
Firstpage :
46
Lastpage :
49
Abstract :
We propose here a method to extract topics from a large document set with topic integration from some small document sets. In order to extract topics, the Non-negative Matrix Factorization (NMF) is applied to document sets. It is useful to integrate the topics from some small document sets since the procedure of topic extraction with the NMF from a large document set takes a long time if the number of documents is large. In this paper, we have shortened the procedure time for the topic extraction from a large document set with the integration of topics extracted from respective some small document sets. In addition, an evaluation of our proposed method has been carried out with the compatibility of topics between the integrated topics and the topics from the large document set by the NMF directly, and the procedure times of the NMF.
Keywords :
data mining; document handling; information retrieval; matrix decomposition; large document set; nonnegative matrix factorization; text mining; topic extraction; topic integration; Data engineering; Data mining; Educational institutions; IP networks; Independent component analysis; Information technology; Knowledge engineering; Mining industry; Text mining; Non-negative Matrix Factorization; Text mining; Topic extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Knowledge Discovery and Data Mining, 2010. WKDD '10. Third International Conference on
Conference_Location :
Phuket
Print_ISBN :
978-1-4244-5397-9
Electronic_ISBN :
978-1-4244-5398-6
Type :
conf
DOI :
10.1109/WKDD.2010.67
Filename :
5432741
Link To Document :
بازگشت