Author_Institution :
Sch. of Electron. & Inf. Eng., Shenzhen Polytech., Shenzhen, China
Abstract :
Through analyzing and studying the BBS topic model, topic similarity, topic inspection, topic evaluation standards and topic developing trends, This paper designs and implements the Chinese BBS topic detection algorithm based on the content analysis, which includes obtaining BBS information by web crawler, processing BBS information based on the URL and Xpath page templates, realizing BBS information participle by ICTLAS, clustering BBS topic by Carrot2, analyzing hot topic based on the power spectrum and predicting of BBS topic based on time series. Finally, this paper developed the Chinese BBS Topic detection system used J2EE development kit, based on the eclipse integrated development environment, combined with Hibernate and GWT technology, and getting good results by tested in various BBS forums.
Keywords :
Internet; Java; information retrieval; pattern clustering; time series; BBS topic clustering; Carrot2; Chinese BBS topic detection algorithm; GWT technology; Google Web Toolkits; Hibernate techology; ICTLAS; J2EE development kit; URL; Web crawler; Xpath page template; content analysis; eclipse integrated development environment; time series prediction; Data mining; Data models; Databases; Java; Predictive models; Time series analysis; Web pages; BBS topic detection; Web crawler; algorithm; hot spot; topic clustering analysis;