• DocumentCode
    249404
  • Title

    Building a Massive Stream Computing Platform for Flexible Applications

  • Author

    Tianjian Chen ; Zhengrui Man ; Hao Li ; Xin Sun ; Wong, Raymond K. ; Zhiwei Yu

  • Author_Institution
    Dept. of Basic Technol., Baidu Inc., Beijing, China
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    414
  • Lastpage
    421
  • Abstract
    Driven by the rapid growth of large scale real-time data mining applications for personalized ads and content recommendations, distributed stream processing systems are widely applied in modern big-data architectures. Designs of existing stream computing systems are mostly focusing on the scalability and availability issues. Other important issues which are essential to the actual cost and productivity, such as the fluctuating work load handling, the stream topology alternation efficiency and the computing topology overlapping, are not well studied. To address these issues in a live, production environment, a new stream processing architecture that is based on a scalability enhanced subscription model is proposed in this paper. We also present a system, called Vortex, that has been implemented using this new architecture. Vortex is a distributed stream computing system engineered to support flexible applications at Baidu. The new architecture enables Vortex to scale well for highly fluctuating workloads and perform on-demand stream topology alternations with minimal overheads. Furthermore, the dynamic message routing mechanism of Vortex allows one processing node to serve different stream topologies. This maximizes the computing resource utilization in the scenarios of topology overlapping. With all these features, Vortex is a powerful platform for both realtime data processing and Map-Reduce job acceleration. Finally, in this paper, we also discuss some applications at Baidu to demonstrate how Vortex can be deployed for various stream computing applications ranging from real-time analytics to the efficient large-scale data mining.
  • Keywords
    Big Data; Internet; data mining; parallel programming; resource allocation; software architecture; Baidu; Internet applications; MapReduce job acceleration; Vortex; availability issue; big-data architectures; computing resource utilization maximization; content recommendations; distributed stream processing systems; dynamic message routing mechanism; flexible applications; large scale real-time data mining applications; massive stream computing platform; on-demand stream topology alternations; personalized ads; real-time data processing; scalability enhanced subscription model; topology overlapping; Computational modeling; Computer architecture; Payloads; Routing; Scalability; Servers; Topology; Architecture; DRPC; Real World Applications; Stream Computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.67
  • Filename
    6906810