Many important software applications are dominated by non-trivial serial components: Amdahl's Law places a hard upper bound on possible speedup that can be achieved for these applications. In this paper, we propos...
详细信息
ISBN:
(纸本)9781424489350
Many important software applications are dominated by non-trivial serial components: Amdahl's Law places a hard upper bound on possible speedup that can be achieved for these applications. In this paper, we propose an integrated software/hardware approach for accelerating hard serial bottlenecks in data structure heavy algorithms. The key idea is to overlap the processing of the main algorithmic functions and the data structure related operations. We describe the language, compiler, ISA and architectural support for such data structure co-processing (DSCP), and define a clean interface between the software and the hardware. We perform extensive simulations using the popular C++ STL container classes, as well as a detailed implementation of our approach for Dijkstra's single-source shortest path algorithm. We find potential for improvements that are well beyond what can be achieved with more conventional parallel computation methods.
In this study, a fast and accurate method to predict the radar cross-section (RCS) of large-scale and complicated shape targets is proposed based on a high-performance parallel finite difference time-domain (FDTD) num...
详细信息
In this study, a fast and accurate method to predict the radar cross-section (RCS) of large-scale and complicated shape targets is proposed based on a high-performance parallel finite difference time-domain (FDTD) numerical method. To this end, several most popular parallel computation methods [including OpenMP, graphics processing unit (GPU), and message-passing interface (MPI)] are discussed first. Based on this discussion, a novel MPI-OpenMP-GPU hybrid parallelcomputation scheme for FDTD is developed. Moreover, the corresponding load-balance parallel configuration is discussed as well. Since this hybrid parallel scheme combines the merits of existing parallel technologies, the computation performance is remarkably improved. The results show that the computation time of the RCS simulation of a large-scale target can be reduced from 3 days to 0.8 h, that is, similar to 98.9% time saving.
暂无评论