• DocumentCode
    580497
  • Title

    On the scalability of image and signal processing parallel applications on emerging cc-NUMA many-cores

  • Author

    Almaless, Ghassan ; Wajsburt, Franck

  • Author_Institution
    LIP6, UPMC, Paris, France
  • fYear
    2012
  • fDate
    23-25 Oct. 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Nowadays, single-chip cache-coherent multi-cores up to 100 cores are a reality and many-cores of hundreds of cores are planned in the near future. This technological shift undertaking by the high-end computer-industry is converging with the design motivation of other domains like embedded and HPC industries. In this paper, we propose to investigate the scalability of the same four unmodified, shared-memory, image and signal processing oriented parallel applications on two targets: (i) embedded - TSAR, a single-chip 256-cores based, Cycle-Accurate-Bit-Accurate simulated, cc-NUMA many-core; and (ii) high-end - an AMD Opteron Interlagos, 64-core based, cc-NUMA many-core. Beside our scalability results on both cc-NUMA targets, our contributions include two operating system mechanisms: (i) a distributed, client/server based, scheduler design allowing the kernel to offer scalable inter-threads synchronization mechanisms; and (ii) a kernel-level memory affinity technique named Auto-Next-Touch allowing the kernel to transparently and automatically migrate physical pages in order to enforce the locality of thread´s memory accesses. Although these two mechanisms are implemented and evaluated in ALMOS (Advanced Locality Management Operating System) running on the TSAR target, they remain applicable to other shared-memory operating systems.
  • Keywords
    image processing; multi-threading; operating systems (computers); shared memory systems; ALMOS; AMD Opteron Interlagos; CC-NUMA many-cores; HPC industries; advanced locality management operating system; auto-next-touch; cycle-accurate-bit-accurate simulation; embedded TSAR; high-end computer-industry; image processing parallel applications; interthreads synchronization mechanisms scalability; kernel-level memory affinity technique; scheduler design; shared-memory applications; signal processing oriented parallel applications; single-chip 256-cores; single-chip cache-coherent multicores; thread memory accesses locality; Instruction sets; Kernel; Linux; Resource management; Scalability; Servers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design and Architectures for Signal and Image Processing (DASIP), 2012 Conference on
  • Conference_Location
    Karlsruhe
  • Print_ISBN
    978-1-4673-2089-4
  • Electronic_ISBN
    978-2-9539987-4-0
  • Type

    conf

  • Filename
    6385369