DocumentCode :
179744
Title :
Automatic extraction of topics on big data streams through scalable advanced analysis
Author :
Romsaiyud, Walisa
Author_Institution :
Grad. Sch. of Inf. Technol., Siam Univ., Bangkok, Thailand
fYear :
2014
fDate :
July 30 2014-Aug. 1 2014
Firstpage :
255
Lastpage :
260
Abstract :
Extracting words, data patterns and topic models from streaming big data by way of real-time processing is a challenging job. Currently, many of applied machine learning techniques in data mining aim to utilize online feedbacks by making model updates faster and quicker. However, Mahout and Massive Online Analysis (MOA) existing solutions are not supported for streaming machine learning, and consequently, not suitable for scalable multiple machines. In this paper enhanced the machine learning algorithms for extracting the words and generating topic models based on the continuing data which was initially proposed. One of the great advantages of the proposed algorithm was the capability to be scaled into multiple machines, in which made it very suitable for real-time processing of streaming data. In general, the algorithm includes two main methods: (a) the first method introduces a principle approach to pre-process documents in an associated time sequence. It implements a class to detect identical files from input files so as to reduce computation time. (b) The second method suits real time monitoring and control of the process from multiple asynchronous text streams. In the experiment, these two methods were alternatively executed, and subsequently after iterations a monotonic convergence was guaranteed. The study conducts the experiments based on a real-world dataset collected from TREC KBA Stream Corpus in 2012. Finally, the accuracy of the proposed method resulted in greater robustness towards the ability to deal with noise and reduce the computation.
Keywords :
data mining; learning (artificial intelligence); MOA; automatic extraction; big data streams; data mining; data patterns; machine learning algorithms; machine learning streaming; machine learning techniques; massive online analysis; real-time processing; scalable advanced analysis; topic models; word extraction; Analytical models; Computer architecture; Data mining; Data models; Distributed databases; Machine learning algorithms; Real-time systems; Big Data; Data Streaming; Machine Learning; Scalable Advanced Massive Online Analysis (SAMOA);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Engineering Conference (ICSEC), 2014 International
Conference_Location :
Khon Kaen
Print_ISBN :
978-1-4799-4965-6
Type :
conf
DOI :
10.1109/ICSEC.2014.6978204
Filename :
6978204
Link To Document :
بازگشت