Title :
Accurate latency estimation in a distributed event processing system
Author :
Chandramouli, Badrish ; Goldstein, Jonathan ; Barga, Roger ; Riedewald, Mirek ; Santos, Ivo
Abstract :
A distributed event processing system consists of one or more nodes (machines), and can execute a directed acyclic graph (DAG) of operators called a dataflow (or query), over long-running high-event-rate data sources. An important component of such a system is cost estimation, which predicts or estimates the “goodness” of a given input, i.e., operator graph and/or assignment of individual operators to nodes. Cost estimation is the foundation for solving many problems: optimization (plan selection and distributed operator placement), provisioning, admission control, and user reporting of system misbehavior. Latency is a significant user metric in many commercial real-time applications. Users are usually interested in quantiles of latency, such as worst-case or 99th percentile. However, existing cost estimation techniques for event-based dataflows use metrics that, while they may have the side-effect of being correlated with latency, do not directly or provably estimate latency. In this paper, we propose a new cost estimation technique using a metric called Mace (Maximum cumulative excess). Mace is provably equivalent to maximum system latency in a (potentially complex, multi-node) distributed event-based system. The close relationship to latency makes Mace ideal for addressing the problems described earlier. Experiments with real-world datasets on Microsoft StreamInsight deployed over 1-13 nodes in a data center validate our ability to closely estimate latency (within 4%), and the use of Mace for plan selection and distributed operator placement.
Keywords :
costing; data flow graphs; directed graphs; distributed processing; optimisation; query processing; Mace; Microsoft Streamlnsight; admission control; commercial real time application; cost estimation; directed acyclic graph; distributed event processing system; event based dataflow; high event rate data source; latency estimation; maximum cumulative excess; maximum system latency; operator graph; Estimation; Measurement; Nickel; Optimal scheduling; Real time systems; Runtime; Silicon;
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2011.5767926