Title :
A Distributed Text Mining System for Online Web Textual Data Analysis
Author :
Zhou, Bin ; Jia, Yan ; Liu, Chunyang ; Zhang, Xu
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Real world Web mining applications usually have different requirements, such as massive data processing, low system latency, and high scalability. In order to meet these different requirements, we proposed a distributed text mining system with a layered architecture that divides the system functions into three layers, namely, the crawling and storage layer, the basic mining layer, and the analysis service layer. Message-oriented middleware are used between these layer components and services to make the communication in a loosely-coupled way. To conquer the data-intensive and storage failure problems, a distributed file system is used to store and manage the raw text data and various indexes. As a case study and example, the design and implementation of an experimental online topic detection application, which can be scaled to handle thousands of Internet news and forum channels and perform online analysis, is also discussed.
Keywords :
Internet; data analysis; data mining; middleware; text analysis; Internet news; Web mining; analysis service layer; crawling layer; data processing; distributed file system; distributed text mining system; forum channels; layered architecture; message-oriented middleware; mining layer; online Web textual data analysis; online analysis; online topic detection; raw text data management; raw text data storage; storage layer; system latency; Crawlers; Data analysis; Distributed databases; Internet; Text mining; distributed computing; information discovery; text mining;
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2010 International Conference on
Conference_Location :
Huangshan
Print_ISBN :
978-1-4244-8434-8
Electronic_ISBN :
978-0-7695-4235-5
DOI :
10.1109/CyberC.2010.11