• DocumentCode
    2693060
  • Title

    Online detection of utility cloud anomalies using metric distributions

  • Author

    Wang, Chengwei ; Talwar, Vanish ; Schwan, Karsten ; Ranganathan, Parthasarathy

  • Author_Institution
    Center for Exp. Res. in Comput. Syst., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    96
  • Lastpage
    103
  • Abstract
    The online detection of anomalies is a vital element of operations in data centers and in utility clouds like Amazon EC2. Given ever-increasing data center sizes coupled with the complexities of systems software, applications, and workload patterns, such anomaly detection must operate automatically, at runtime, and without the need for prior knowledge about normal or anomalous behaviors. Further, detection should function for different levels of abstraction like hardware and software, and for the multiple metrics used in cloud computing systems. This paper proposes EbAT - Entropy-based Anomaly Testing - offering novel methods that detect anomalies by analyzing for arbitrary metrics their distributions rather than individual metric thresholds. Entropy is used as a measurement that captures the degree of dispersal or concentration of such distributions, aggregating raw metric data across the cloud stack to form entropy time series. For scalability, such time series can then be combined hierarchically and across multiple cloud subsystems. Experimental results on utility cloud scenarios demonstrate the viability of the approach. EbAT outperforms threshold-based methods with on average 57.4% improvement in accuracy of anomaly detection and also does better by 59.3% on average in false alarm rate with a `near-optimum´ threshold-based method.
  • Keywords
    Internet; computer centres; program testing; software metrics; Amazon EC2; EbAT; anomaly detection; data centers; entropy-based anomaly testing; metric distributions; online detection; utility cloud anomalies; Application software; Cloud computing; Dispersion; Entropy; Hardware; Runtime; Scalability; System software; Testing; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Operations and Management Symposium (NOMS), 2010 IEEE
  • Conference_Location
    Osaka
  • ISSN
    1542-1201
  • Print_ISBN
    978-1-4244-5366-5
  • Electronic_ISBN
    1542-1201
  • Type

    conf

  • DOI
    10.1109/NOMS.2010.5488443
  • Filename
    5488443