DocumentCode :
428912
Title :
Detecting buzz from time-sequenced document streams
Author :
Yi, Jeonghee
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
fYear :
2005
fDate :
29 March-1 April 2005
Firstpage :
347
Lastpage :
352
Abstract :
This paper presents a formal method of detecting emerging and changing interests that appear in document streams arriving continuously over time. Examples of such document streams include email, news articles, and Web logs (or blogs). We utilize the temporal information associated with documents in the streams and discover emerging issues and topics of interest and their change by detecting buzzwords in the documents. Buzzwords are terms that occur with strong momentum for a relatively short period of time. Our approach for buzz detection is based on the notion of "burst of activities" proposed by Kleinberg [2002]. The burst of activities is modeled using a weighted automaton. We propose an algorithm to discover buzzwords of high intensity measured by their momentum and relative duration of the bursts. The method is applied and validated on a stream of blog postings and we report the experiment results.
Keywords :
Internet; automata theory; data mining; document handling; Web logs; blog posting; buzzword detection; email; news articles; temporal information; time-sequenced document streams; weighted automaton; Automata; Blogs; Costs; Detection algorithms; Event detection; Frequency; Weight measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
e-Technology, e-Commerce and e-Service, 2005. EEE '05. Proceedings. The 2005 IEEE International Conference on
Print_ISBN :
0-7695-2274-2
Type :
conf
DOI :
10.1109/EEE.2005.57
Filename :
1402320
Link To Document :
بازگشت