• DocumentCode
    3522851
  • Title

    LiteTM: Reducing transactional state overhead

  • Author

    Jafri, Syed Ali Raza ; Thottethodi, Mithuna ; Vijaykumar, T.N.

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2010
  • fDate
    9-14 Jan. 2010
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Transactional memory (TM) has been proposed to address some of the programmability issues of chip multiprocessors. Hardware implementations of transactional memory (HTMs) have made significant progress in providing support for features such as long transactions that spill out of the cache, and context switches, page and thread migration in the middle of transactions. While essential for the adoption of HTMs in real products, supporting these features has resulted in significant state overhead. For instance, TokenTM adds at least 16 bits per block in the caches which is significant in absolute terms, and steals 16 of 64 (25%) memory ECC bits per block, weakening error protection. Also, the state bits nearly double the tag array size. These significant and practical concerns may impede the adoption of HTMs, squandering the progress achieved by HTMs. The overhead comes from tracking the thread identifier and the transactional read-sharer count at the Ll-block granularity. The thread identifier is used to identify the transaction, if only one, to which an Ll-evicted block belongs. The read-sharer count is used to identify conflicts involving multiple readers (i.e., write to a block with non-zero count). To reduce this overhead, we observe that the thread identifiers and read-sharer counts are not needed in a majority of cases. (1) Repeated misses to the same blocks are rare within a transaction (i.e., locality holds). (2) Transactional read-shared blocks that both are evicted from multiple sharers´ Lis and are involved in conflicts are rare. Exploiting these observations, we propose a novel HTM, called LiteTM, which completely eliminates the count and identifier and uses software to infer the lost information. Using simulations of the STAMP benchmarks running on 8 cores, we show that LiteTM reduces TokenTM´s state overhead by about 87% while performing within 4%, on average, and 10%, in the worst case, of TokenTM.
  • Keywords
    concurrency control; microprocessor chips; shared memory systems; LiteTM; Ll-block granularity; STAMP benchmarks; chip multiprocessors; thread identifier; thread migration; transactional memory; transactional read-sharer count; transactional state overhead; Delay; Error correction codes; Hardware; Impedance; Parallel programming; Power dissipation; Protection; Switches; System recovery; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on
  • Conference_Location
    Bangalore
  • ISSN
    1530-0897
  • Print_ISBN
    978-1-4244-5658-1
  • Type

    conf

  • DOI
    10.1109/HPCA.2010.5416653
  • Filename
    5416653