• DocumentCode
    251798
  • Title

    Detecting Discontinuities in Large Scale Systems

  • Author

    Malik, Haroon ; Davis, Ian J. ; Godfrey, Michael W. ; Neuse, Douglas ; Mankovskii, Serge

  • Author_Institution
    David R. Cheriton Sch. of Comput., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2014
  • fDate
    8-11 Dec. 2014
  • Firstpage
    345
  • Lastpage
    354
  • Abstract
    Cloud providers and data centers rely heavily on forecasts to accurately predict future workload. This information helps them in appropriate virtualization and cost-effective provisioning of the infrastructure. The accuracy of a forecast greatly depends upon the merit of performance data fed to the underlying algorithms. One of the fundamental problems faced by analysts in preparing data for use in forecasting is the timely identification of data discontinuities. A discontinuity is an abrupt change in a time-series pattern of a performance counter that persists but does not recur. Analysts need to identify discontinuities in performance data so that they can a) remove the discontinuities from the data before building a forecast model and b) retrain an existing forecast model on the performance data from the point in time where a discontinuity occurred. There exist several approaches and tools to help analysts identify anomalies in performance data. However, there exists no automated approach to assist data center operators in detecting discontinuities in the first place. In this paper, we present and evaluate our proposed approach to help data center analysts and cloud providers automatically detect discontinuities. A case study on the performance data obtained from a large cloud provider and performance tests conducted using an open source benchmark system show that our proposed approach provides on average precision of 84% and recall 88%. The approach doesn´t require any domain knowledge to operate.
  • Keywords
    cloud computing; computer centres; large-scale systems; time series; virtualisation; cloud provider; cost-effective provisioning; data center analyst; data center operator; data discontinuity; detecting discontinuity; domain knowledge; forecast model; large scale systems; open source benchmark system; performance counter; performance data; performance test; time-series pattern; underlying algorithm; virtualization; Analytical models; Data models; Forecasting; Mathematical model; Predictive models; Principal component analysis; Radiation detectors; Forecast; anomaly; data center; discontinuity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/UCC.2014.44
  • Filename
    7027511