Abstract :
So far, the privileged instructions MONITOR and MWAIT introduced with Intel Prescott core, have been used mostly for inter-thread synchronization in operating systems code. In a hyper-threaded processor, these instructions offer a "performance-optimized" way for threads involved in synchronization events to wait on a condition. In this work, we explore the potential of using these instructions for synchronizing application threads that execute on hyper-threaded processors, and are characterized by workload asymmetry. Initially, we propose a framework through which one can use MON- ITOR/MWAIT to build condition wait and notification primitives, with minimal kernel involvement. Then, we evaluate the efficiency of these primitives in a bottom-up manner: at first, we quantify certain performance aspects of the primitives that reflect the execution model under consideration, such as resource consumption and responsiveness, and we compare them against other commonly used implementations. As a further step, we use our primitives to build synchronization barriers. Again, we examine the same performance issues as before, and using a pseudo-benchmark we evaluate the efficiency of our implementation for fine-grained inter-thread synchronization. In terms of throughput, our barriers yielded 12% better performance on average compared to Pthreads, and 26% compared to a spin-loops-based implementation, for varying levels of threads asymmetry. Finally, we test our barriers in a real- world scenario, and specifically, in applying thread-level Speculative Pre computation on four applications. For this multithreaded execution scheme, our implementation provided up to 7% better performance compared to Pthreads, and up to 40% compared to spin-loops-based barriers.
Keywords :
instruction sets; microprocessor chips; multi-threading; multiprocessing systems; synchronisation; Intel Prescott core; MONITOR; MWAIT; Pthreads; application threads; asymmetric thread synchronization; hyperthreaded processors; interthread synchronization; operating systems code; performance issues; resource consumption; synchronization barriers; workload asymmetry; Computer aided instruction; Delay; Hardware; Kernel; Pipelines; Processor scheduling; Spinning; Surface-mount technology; Testing; Yarn;