• DocumentCode
    170761
  • Title

    TideWatch: Fingerprinting the cyclicality of big data workloads

  • Author

    Williams, Doug ; Shuai Zheng ; Xiangliang Zhang ; Hani Jamjoom

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2014
  • fDate
    April 27 2014-May 2 2014
  • Firstpage
    2031
  • Lastpage
    2039
  • Abstract
    Intrinsic to “big data” processing workloads (e.g., iterative MapReduce, Pregel, etc.) are cyclical resource utilization patterns that are highly synchronized across different resource types as well as the workers in a cluster. In Infrastructure as a Service settings, cloud providers do not exploit this characteristic to better manage VMs because they view VMs as “black boxes.” We present TideWatch, a system that automatically identifies cyclicality and similarity in running VMs. TideWatch predicts period lengths of most VMs in Hadoop workloads within 9% of actual iteration boundaries and successfully classifies up to 95% of running VMs as participating in the appropriate Hadoop cluster. Furthermore, we show how TideWatch can be used to improve the timing of VM migrations, reducing both migration time and network impact by over 50% when compared to a random approach.
  • Keywords
    cloud computing; data handling; iterative methods; resource allocation; virtual machines; Hadoop cluster; TideWatch; VM; big data processing workloads; big data workload cyclicality; black boxes; cloud providers; cyclical resource utilization; cyclical resource utilization patterns; infrastructure as a service settings; iteration boundaries; Computers; Conferences; Noise; Resource management; Smoothing methods; Synchronization; Time series analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2014 Proceedings IEEE
  • Conference_Location
    Toronto, ON
  • Type

    conf

  • DOI
    10.1109/INFOCOM.2014.6848144
  • Filename
    6848144