Title :
QuickTM: A Hardware Solution to a High Performance Unbounded Transactional Memory
Author :
Sanyal, Sutirtha ; Roy, Sourav
Author_Institution :
Barcelona Supercomput. Center, Barcelona, Spain
Abstract :
Transactional Memory (TM) is an emerging technology which simplifies the concurrency control in a parallel program. In this paper we propose Quick TM, a new hardware transactional memory (HTM) architecture. It incorporates three features to address known bottlenecks in the existing HTM architectures. First, we propose hardware-only dynamic detection of true-shared variables. Our result shows that true-shared variables account for only about 20% in the commit set of any transaction. Rest can be completely disregarded from the commit phase. This shortens every commit phase drastically resulting in a significant overall speed-up. Second, we keep both the speculative and the last committed version local to each processor. This benefits when a transaction is repeated in a loop. The processor request gets satisfied from the L1 data cache(L1D) itself. Furthermore, since both the versions are locally maintained, the commit action involves only broadcast of addresses. Third, we have proposed a mechanism to address overflow in transactions. In our proposal, each processor continues to run transactions even if one processor has overflown its L1D. Our technique eliminates the stall of a thread even if it conflicts with the overflown transaction. Overflown transaction commits in-place and periodically broadcasts its write set addresses, termed “partial commit”. This gradually reduces conflicts and allows other threads to progress towards commit. Moreover, the technique does not require any additional hardware at any memory hierarchy level beyond L1. Quick TM outperforms the state-of-the-art scalable HTM architecture, Scalable-TCC, on average by 20% in the latest TM benchmark suite STAMP. It outperforms the original TCC proposal with serialized commit by 28% on average. Maximum speed-up achieved in these two cases are 43% and 67% respectively. Our proposal handles transaction overflow gracefully and outperforms the current overflow-aware HTM proposal, One TM-concu- - rrent by 12% on average.
Keywords :
concurrency control; parallel architectures; parallel programming; shared memory systems; transaction processing; L1 data cache; QuickTM; concurrency control; hardware transactional memory architecture; hardware-only dynamic detection; high performance unbounded transactional memory; overflown transaction; parallel program; Cache Associativity; Cache-Coherence; Dynamic Separation; Hardware Transactional Memory; Tag Duplication; Transaction Overflow;
Conference_Titel :
High Performance Computing and Communications (HPCC), 2010 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4244-8335-8
Electronic_ISBN :
978-0-7695-4214-0
DOI :
10.1109/HPCC.2010.86