Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex lock...
详细信息
ISBN:
(纸本)9781450344937
Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex locks) to serialize threads' accesses to data. This limits parallelism because it forces threads to sequentially access shared resources. Additionally, systems use cache coherence to ensure that processors always operate on the most up-to-date version of a value even in the presence of private caches. Coherence protocol implementations cause processors to serialize their accesses to shared data, further limiting parallelism and performance.
In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional Memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existi...
详细信息
ISBN:
(纸本)9781450344937
In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional Memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existing hardware support for TM by interposing a hybrid fall-back layer before the sequential, big-lock fall-back path, used by standard TSX-supported solutions in order to guarantee progress. In our experimental evaluation we use SynQuake, a realistic game benchmark modeled after Quake. Our results show that our hybrid transactional system,which we call HythTM, is able to reduce the number of transactions that go to the sequential software layer, hence avoiding hardware transaction aborts and loss of parallelism. HythTM optimizes application throughput and scalability up to 5.05x, when compared to the hardware TM with sequential fall-back path.
This work proposes a low-overhead half-barrier pattern to schedule fine-grain parallel loops and considers its integration in the Intel OpenMP and Cilkplus schedulers. Experimental evaluation demonstrates that the sch...
ISBN:
(纸本)9781450349826
This work proposes a low-overhead half-barrier pattern to schedule fine-grain parallel loops and considers its integration in the Intel OpenMP and Cilkplus schedulers. Experimental evaluation demonstrates that the scheduling overhead of our techniques is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We observe 22% speedup on 48 threads, with a peak of 2.8x speedup.
暂无评论