Title :
A Thread Specific Load Balancing Technique for a Clustered SMT Architecture
Author :
Mehri, Maryam ; Hassanein, Wessam M.
Author_Institution :
Univ. of Calgary, Calgary
Abstract :
Clustering an architecture enables hardware to operate at high clock frequencies by grouping resources into small clusters. This allows local communication within a cluster to travel shorter distances at the cost of an increased number of communications and longer inter-cluster communication latencies. Simultaneous multi-threaded architectures (SMT) allow better utilization of resources, thus a clustered SMT architecture exploits the advantages of SMT architectures and hides much of the inter-cluster communication latencies incurred due to clustering. This is achieved by executing instructions from a different thread when a thread stalls waiting for inter-cluster communication. In this work we study a new load balancing technique for an SMT clustered architecture namely, thread specific load balancing (TSLB). Unlike previous load balancing techniques that balance the load across clusters based on the total number of instructions of all threads in each cluster, TSLB allows each thread to balance its load between clusters independently. With this policy queues can take advantage of the inherent parallelism between instructions from the different threads. In previous load balancing techniques the overall number of instructions from the same thread assigned to distant clusters can be considerable, and if so, will result in high number of inter-cluster communications. In contrast, TSLB maintains an almost constant average number of inter-cluster communications due to the even distribution of instructions between clusters. We have assigned a number to each thread in each cluster that indicates the maximum number of instructions that can be executed to the specified cluster from that thread; these numbers are initially all equal but will vary with time depending on the workload of each thread. This decreases the potential queue conflicts in larger threads thus allowing them to execute faster.
Keywords :
multi-threading; parallel architectures; resource allocation; clustered SMT architecture; high clock frequencies; policy queues; simultaneous multi-threaded architectures; thread specific load balancing technique; Algorithm design and analysis; Clocks; Clustering algorithms; Computer architecture; Costs; Delay; Hardware; Load management; Surface-mount technology; Yarn;
Conference_Titel :
Electrical and Computer Engineering, 2007. CCECE 2007. Canadian Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
1-4244-1020-7
Electronic_ISBN :
0840-7789
DOI :
10.1109/CCECE.2007.242