• DocumentCode
    2480825
  • Title

    Mamba: A scalable communication centric multi-threaded processor architecture

  • Author

    Chadwick, Gregory A. ; Moore, Simon W.

  • Author_Institution
    Comput. Lab., Univ. of Cambridge, Cambridge, UK
  • fYear
    2012
  • fDate
    Sept. 30 2012-Oct. 3 2012
  • Firstpage
    277
  • Lastpage
    283
  • Abstract
    In this paper we describe Mamba, an architecture designed for multi-core systems. Mamba has two major aims: (i) make on-chip communication explicit to the programmer so they can optimize for it and (ii) support many threads and supply very lightweight communication and synchronization primitives for them. These aims are based on the observations that: (i) as feature sizes shrink, on-chip communication becomes relatively more expensive than computation and (ii) as we go increasingly multi-core we need highly scalable approaches to inter-thread communication and synchronization. We employ a network of processors where a given memory access will always go to the same cache, removing the need for a coherence protocol and allowing the program explicit control over all communication. A presence bit associated with each word provides a very lightweight, finegrained synchronization primitive. We demonstrate an FPGA implementation with micro-benchmarks of standard spinlock and FIFO implementations and show that presence bit based implementations provide more efficient locking, and lower latency FIFO communications compared to a conventional shared memory implementation whilst also requiring fewer memory accesses. We also show that Mamba performance is insensitive to total thread count, allowing the use of as many threads as desired.
  • Keywords
    cache storage; field programmable gate arrays; multi-threading; multiprocessing systems; parallel memories; queueing theory; synchronisation; FIFO; FPGA; Mamba; bit based implementation; cache storage; fine grained synchronization primitive; interthread communication; lightweight communication; memory access; multicore system; multithreaded processor architecture; on-chip communication; optimization; scalable communication; Benchmark testing; Computer architecture; Field programmable gate arrays; Instruction sets; Message systems; Registers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design (ICCD), 2012 IEEE 30th International Conference on
  • Conference_Location
    Montreal, QC
  • ISSN
    1063-6404
  • Print_ISBN
    978-1-4673-3051-0
  • Type

    conf

  • DOI
    10.1109/ICCD.2012.6378652
  • Filename
    6378652