• DocumentCode
    1125893
  • Title

    In-kernel integration of operating system and infiniband functions for high performance computing clusters: a DSM example

  • Author

    Liss, Liran ; Birk, Yitzhak ; Schuster, Assaf

  • Author_Institution
    Dept. of Electr. Eng., Technion-Israel Inst. of Technol., Haifa, Israel
  • Volume
    16
  • Issue
    9
  • fYear
    2005
  • Firstpage
    830
  • Lastpage
    840
  • Abstract
    The infiniband (IB) system area network (SAN) enables applications to access hardware directly from user level, reducing the overhead of user-kernel crossings during data transfer. However, distributed applications that exhibit close coupling between network and OS services may benefit from accessing IB from the kernel through IB´s native verbs interface, which permits tight integration of these services. We assess this approach using a sequential-consistency distributed shared memory (DSM) system as an example. We first develop primitives that abstract the low-level communication and kernel details, and efficiently serve the application´s communication, memory, and scheduling needs. Next, we combine the primitives to form a kernel DSM protocol. The approach is evaluated using our full-fledged Linux kernel DSM implementation over infiniband. We show that overheads are reduced substantially, and overall application performance is improved in terms of both absolute execution time and scalability relative to an entirely user level implementation.
  • Keywords
    Linux; computer network reliability; distributed shared memory systems; local area networks; network interfaces; operating system kernels; parallel processing; Linux kernel; hardware-software interface; high performance computing cluster; high-speed network; infiniband network; operating system kernel; parallel computing; scheduling; sequential-consistency distributed shared memory system; system area network; Access protocols; Application software; Computer Society; Hardware; High performance computing; Kernel; Linux; Operating systems; Scalability; Storage area networks; Hardware/software interfaces; distributed shared memory; high-speed networks; parallel computing.;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2005.111
  • Filename
    1490513