• DocumentCode
    628203
  • Title

    CloudPD: Problem determination and diagnosis in shared dynamic clouds

  • Author

    Sharma, Bhanu P ; Jayachandran, Prasanth ; Verma, A. ; Das, Chita R.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2013
  • fDate
    24-27 June 2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    In this work, we address problem determination in virtualized clouds. We show that high dynamism, resource sharing, frequent reconfiguration, high propensity to faults and automated management introduce significant new challenges towards fault diagnosis in clouds. Towards this, we propose CloudPD, a fault management framework for clouds. CloudPD leverages (i) a canonical representation of the operating environment to quantify the impact of sharing; (ii) an online learning process to tackle dynamism; (iii) a correlation-based performance models for higher detection accuracy; and (iv) an integrated end-to-end feedback loop to synergize with a cloud management ecosystem. Using a prototype implementation with cloud representative batch and transactional workloads like Hadoop, Olio and RUBiS, it is shown that CloudPD detects and diagnoses faults with low false positives (<; 16%) and high accuracy of 88%, 83% and 83%, respectively. In an enterprise trace-based case study, CloudPD diagnosed anomalies within 30 seconds and with an accuracy of 77%, demonstrating its effectiveness in real-life operations.
  • Keywords
    cloud computing; fault diagnosis; fault tolerant computing; resource allocation; security of data; virtualisation; CloudPD; anomaly diagnosis; canonical representation; cloud fault diagnosis; cloud management ecosystem; cloud representative batch; correlation-based performance models; enterprise trace-based case study; fault management framework; integrated end-to-end feedback loop; online learning process; problem determination; problem diagnosis; resource sharing; shared dynamic clouds; transactional workloads; virtualized clouds; Biological system modeling; Context; Correlation; Engines; Measurement; Monitoring; Servers; Cloud; Fault Diagnosis; Hadoop MapReduce; Performance; Problem Determination; Virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks (DSN), 2013 43rd Annual IEEE/IFIP International Conference on
  • Conference_Location
    Budapest
  • ISSN
    1530-0889
  • Print_ISBN
    978-1-4673-6471-3
  • Type

    conf

  • DOI
    10.1109/DSN.2013.6575298
  • Filename
    6575298