• DocumentCode
    2052255
  • Title

    On-the-fly kernel updates for high-performance computing clusters

  • Author

    Makris, Kristis ; Ryu, Kyung Dong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ
  • fYear
    2006
  • fDate
    25-29 April 2006
  • Abstract
    High-performance computing clusters running long-lived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss opportunities for increased performance via specialized kernel support, cannot benefit from new kernel features, and continue to operate with kernel security holes unpatched, at least until the next scheduled maintenance date. We developed a system enabling dynamic kernel updates in parallel computing clusters to address these problems. Our system, DynAMOS, is founded on execution flow high-jacking through function cloning. It enables commodity operating systems popularly used in clusters gain adaptive and mutative capabilities. To demonstrate the efficacy of our system, we illustrate our experience in dynamically updating and extending a Linux cluster. We introduce adaptive memory paging for efficient gang-scheduling; extend the kernel´s process scheduler to support unobtrusive fine-grain cycle stealing, apply public security fixes, and inject performance monitoring functionality to a selection of kernel functions. Our benchmarks show that the overhead imposed by DynAMOS is mostly in the range of 1-8% for common Linux kernel functions
  • Keywords
    operating system kernels; scheduling; software maintenance; workstation clusters; DynAMOS; Linux cluster; adaptive memory paging; commodity operating systems; execution flow high-jacking; fine-grain cycle stealing; function cloning; gang scheduling; high-performance computing clusters; kernel software updates; parallel dynamic kernel updates; performance monitoring; public security fixes; Application software; Cloning; Instruments; Kernel; Linux; Magnetohydrodynamic power generation; Operating systems; Parallel processing; Processor scheduling; Security;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
  • Conference_Location
    Rhodes Island
  • Print_ISBN
    1-4244-0054-6
  • Type

    conf

  • DOI
    10.1109/IPDPS.2006.1639690
  • Filename
    1639690