• DocumentCode
    23346
  • Title

    A General Scalable and Elastic Content-Based Publish/Subscribe Service

  • Author

    Yijie Wang ; Xingkong Ma

  • Author_Institution
    Sci. & Technol. on Parallel & Distrib. Process. Lab., Nat. Univ. of Defense Technol., Changsha, China
  • Volume
    26
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 1 2015
  • Firstpage
    2100
  • Lastpage
    2113
  • Abstract
    The big data era is characterized by the emergence of live content with increasing complexities of data dimensionality and data sizes, which poses a new challenge to emergency applications: how to timely disseminate large-scale live content to users who are interested in. The publish/subscribe (pub/sub) model is widely used to disseminate data because of its possibility of expanding the system to Internet-scale size. However, existing pub/sub systems are inadequate to meet the requirement of disseminating live content in the big data era, since their multi-hop routing techniques and coarse-grained partitioning techniques lead to a low matching throughput, and their upload capacities do not scale well. In this paper, we propose a general scalable and elastic pub/sub service based on the cloud computing environment, called GSEC. For generality, we propose a two-layer pub/sub framework to support the dissemination with diverse data sizes and data dimensionality. For scalability, a hybrid space partitioningtechnique is proposed to achieve high matching throughput, which divides subscriptions into multiple clusters in a hierarchical manner. Moreover, a helper-based content distribution technique is proposed to achieve high upload bandwidth, where servers act as both providers and coordinators to fully explore the upload capacity of the system. For elasticity, we propose a performance-aware provisioningtechnique to adjust the scale of servers to adapt to the churn workloads. To evaluate the performance of GSEC, about 1,000 servers are deployed and hundreds of thousands of live content items are tested in our CloudStack-based testbed. Extensive experiments confirm that GSEC can linearly increase the capacities of event matching and content distribution with the growth of servers, adaptively adjust these capacities in tens of seconds according to the churn workloads, and significantly outperforms the state-of-the-art approaches under various parameter settings.
  • Keywords
    Big Data; message passing; middleware; network routing; CloudStack-based testbed; Internet-scale; big data era; churn workloads; cloud computing environment; coarse-grained partitioning techniques; content distribution; elastic content-based publish-subscribe service; event matching; general scalability; helper-based content distribution technique; hybrid space partitioning technique; multihop routing techniques; performance-aware provisioning technique; pub-sub model; Clustering algorithms; Humidity; Routing; Scalability; Servers; Subscriptions; Throughput; Publish/subscribe; cloud computing; content distribution; event matching; space partitioning;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2346759
  • Filename
    6876150