Title :
Online topic detection and tracking of financial news based on hierarchical clustering
Author :
Dai, Xiang-ying ; Chen, Qing-cai ; Wang, Xiao-long ; Xu, Jun
Author_Institution :
Intell. Comput. Res. Center, Harbin Inst. of Technol., Shenzhen, China
Abstract :
In this paper, we apply TDT technology to the vertical search engine in the financial field. The returned results are grouped into several topics with the stock as the unit. Then we show the topics to the users in time series order. As a result, users can easily learn about the important events which belong to a stock. Moreover, the causes and the effects of these events can also be found out easily. We improve the common agglomerative hierarchical clustering algorithm based on average-link method, which is then used to implement the retrospective topic detection and the online topic detection of news stories of the stocks. Additionally, the improved single pass clustering algorithm is employed to accomplish topic tracking. We consider that the feature terms which occur in the title of a news story contribute more during the similarity calculation and increase their corresponding weights. Experiments are performed on two datasets which are annotated by human judgment. The results show that the proposed method can effectively detect and track the online financial topics.
Keywords :
information retrieval; pattern clustering; portals; search engines; stock markets; text analysis; time series; TDT technology; agglomerative hierarchical clustering algorithm; average-link method; financial news; online topic detection; online topic tracking; retrospective topic detection; single pass clustering algorithm; stock news; time series; topic tracking; vertical search engine; Clustering algorithms; Clustering methods; Computational modeling; Cybernetics; Machine learning; Measurement; Web pages; Agglomerative Hierarchical Clustering; Topic Detection and Tracking; Vector Space Model;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
DOI :
10.1109/ICMLC.2010.5580677