DocumentCode :
249404
Title :
Building a Massive Stream Computing Platform for Flexible Applications
Author :
Tianjian Chen ; Zhengrui Man ; Hao Li ; Xin Sun ; Wong, Raymond K. ; Zhiwei Yu
Author_Institution :
Dept. of Basic Technol., Baidu Inc., Beijing, China
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
414
Lastpage :
421
Abstract :
Driven by the rapid growth of large scale real-time data mining applications for personalized ads and content recommendations, distributed stream processing systems are widely applied in modern big-data architectures. Designs of existing stream computing systems are mostly focusing on the scalability and availability issues. Other important issues which are essential to the actual cost and productivity, such as the fluctuating work load handling, the stream topology alternation efficiency and the computing topology overlapping, are not well studied. To address these issues in a live, production environment, a new stream processing architecture that is based on a scalability enhanced subscription model is proposed in this paper. We also present a system, called Vortex, that has been implemented using this new architecture. Vortex is a distributed stream computing system engineered to support flexible applications at Baidu. The new architecture enables Vortex to scale well for highly fluctuating workloads and perform on-demand stream topology alternations with minimal overheads. Furthermore, the dynamic message routing mechanism of Vortex allows one processing node to serve different stream topologies. This maximizes the computing resource utilization in the scenarios of topology overlapping. With all these features, Vortex is a powerful platform for both realtime data processing and Map-Reduce job acceleration. Finally, in this paper, we also discuss some applications at Baidu to demonstrate how Vortex can be deployed for various stream computing applications ranging from real-time analytics to the efficient large-scale data mining.
Keywords :
Big Data; Internet; data mining; parallel programming; resource allocation; software architecture; Baidu; Internet applications; MapReduce job acceleration; Vortex; availability issue; big-data architectures; computing resource utilization maximization; content recommendations; distributed stream processing systems; dynamic message routing mechanism; flexible applications; large scale real-time data mining applications; massive stream computing platform; on-demand stream topology alternations; personalized ads; real-time data processing; scalability enhanced subscription model; topology overlapping; Computational modeling; Computer architecture; Payloads; Routing; Scalability; Servers; Topology; Architecture; DRPC; Real World Applications; Stream Computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
Type :
conf
DOI :
10.1109/BigData.Congress.2014.67
Filename :
6906810
Link To Document :
بازگشت