A widely adopted methodology for verifying the memory subsystem of a Chip Multiprocessor (CMP) is to verify executions of parallel test programs on the CMP against the given memoryconsistency model, which has been lo...
详细信息
A widely adopted methodology for verifying the memory subsystem of a Chip Multiprocessor (CMP) is to verify executions of parallel test programs on the CMP against the given memoryconsistency model, which has been long known to be time consuming in both theory and practice. To accelerate memory consistency verification, previous approaches have to bear the cost of availability (e.g., relying on dedicated hardware supports that have not been offered by many commodity CMPs) or completeness (e.g., missing some bugs). In the meantime, the impact of parallel programs on memory consistency verification has more or less been overlooked. One piece of evidence is that few investigations have been dedicated to finding appropriate test programs enabling more efficient verification From a novel perspective of test program, we devise a practical technique called "program regularization," which can effectively reduce the computation time of memory consistency verification. The key intuition behind program regularization is that any parallel program, if being reformed appropriately, can enable efficient memory consistency verification. More specifically, for an original program, program regularization introduces some auxiliary memory addresses, and periodically inserts load/store operations accessing these addresses to the original program. With the regularized program, memory consistency verification can be accomplished in linear time (with respect to the number of memory operations) when the number of processors is fixed. Experimental results show that program regularization can significantly accelerate memory consistency verification. Last but not least, our technique, which does not rely on concrete verification algorithm or dedicated hardware support, can be smoothly integrated into existing presilicon/postsilicon verification platforms of industrial CMPs to speed up memory consistency verification.
Contemporary microprocessors use relaxed memoryconsistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In ...
详细信息
ISBN:
(纸本)9781467395243
Contemporary microprocessors use relaxed memoryconsistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system's memoryconsistency model is an NP-complete problem. Several graph-based approximations to this problem based on carefully constructed randomized test programs have been proposed in the literature;however, such approaches are sequential and execute slowly on large graphs of interest. Unfortunately, the ability to execute larger tests is tremendously important, since such tests enable one to expose bugs more quickly. Successfully executing more tests per unit time is also desirable, since it allows for one to check for a greater variety of errors in the memory subsystem by utilizing a more diverse set of tests. This paper improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. We first show performance improvements from a sequential approach and gain further performance from parallel implementations in OpenMP and CUDA. For large tests of interest, our GPU implementation achieves an average application speedup of 26.36x over existing techniques in use at NVIDIA.
暂无评论