Title :
Priority-Based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications
Author :
Bellavista, Paolo ; Corradi, Antonio ; Reale, Andrea ; Ticca, Nicola
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. di Bologna, Bologna, Italy
Abstract :
Distributed Stream Processing Systems (DSPSs) are attracting increasing industrial and academic interest as flexible tools to implement scalable and cost-effective on-line analytics applications over Big Data streams. Often hosted in private/public cloud deployment environments, DSPSs offer data stream processing services that transparently exploit the distributed computing resources made available to them at runtime. Given the volume of data of interest, possible (hard/soft) real-time processing requirements, and the time-variable characteristics of input data streams, it is very important for DSPSs to use smart and innovative scheduling techniques that allocate computing resources properly and avoid static over-provisioning. In this paper, we originally investigate the suitability of exploiting application-level indications about differentiated priorities of different stream processing tasks to enable application-specific DSPS resource scheduling, e.g., Capable of re-shaping processing resources in order to dynamically follow input data peaks of prioritized tasks, with no static over-provisioning. We originally propose a general and simple technique to design and implement priority-based resource scheduling in flow-graph-based DSPSs, by allowing application developers to augment DSPS graphs with priority metadata and by introducing an extensible set of priority schemas to be automatically handled by the extended DSPS. In addition, we show the effectiveness of our approach via its implementation and integration in our Quasit DSPS and through experimental evaluation of this prototype on a real-world stream processing application of Big Data vehicular traffic analysis.
Keywords :
Big Data; cloud computing; flow graphs; meta data; real-time systems; resource allocation; scheduling; Big Data applications; Big Data vehicular traffic analysis; Quasit DSPS; application-level indications; application-specific DSPS resource scheduling; computing resource allocation; data-stream processing services; distributed computing resources; distributed stream processing systems; flow-graph-based DSPS; hard real-time processing requirements; input data peaks; input datastreams; meta data; prioritized tasks; priority-based resource scheduling; processing resource re-shaping; real-world stream processing application; scalable cost-effective online analytics applications; soft real-time processing requirements; stream processing tasks; time-variable characteristics; Big data; Data models; Digital signal processing; Processor scheduling; Runtime; Scheduling; Throughput; Application-level and Application-specific Scheduling; Big Data; Cloud Computing Optimization; Distributed Stream Processing; Priority-based Resource Scheduling; Vehicular Traffic Analysis;
Conference_Titel :
Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on
Conference_Location :
London
DOI :
10.1109/UCC.2014.46