DocumentCode :
3142202
Title :
News topic detection based on hierarchical clustering and named entity
Author :
Sheng Huang ; Xueping Peng ; Zhendong Niu ; Kunshan Wang
Author_Institution :
Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol., Beijing, China
fYear :
2011
fDate :
27-29 Nov. 2011
Firstpage :
280
Lastpage :
284
Abstract :
News topic detection is the process of organizing news story collections and real-time news/broadcast streams into news topics. While unlike the traditional text analysis, it is a process of incremental clustering, and generally divided into retrospective topic detection and online topic detection. This paper considers the feature changes of modern news data experienced from the past, and presents a new topic detection strategy based on hierarchical clustering and named entities. Topic detection process is also divided into retrospective and online steps, and named entities in the news stories are employed in the topic clustering algorithm. For the online step´s efficiency and precision, this paper first clusters news stories in each time window into micro-clusters, and then extracts three representation vectors for each micro-cluster to calculate the similarity to existing topics. The experimental results show remarkable improvement compared with recently most applied topic detection method.
Keywords :
Internet; pattern clustering; text analysis; hierarchical clustering; incremental clustering process; named entity recognition; news topic detection; online topic detection; retrospective topic detection; text analysis; topic clustering algorithm; Measurement; Organizing; agglomerative hierarchical clustering; named entity; news topic detection; vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
Conference_Location :
Tokushima
Print_ISBN :
978-1-61284-729-0
Type :
conf
DOI :
10.1109/NLPKE.2011.6138209
Filename :
6138209
Link To Document :
بازگشت