This paper accelerates the exact evaluation of large numbers of 3D geometric predicates with an algorithm whose work is partitioned between the CPU and the GPU on a high-performance computer to exploit the relative st...
详细信息
This paper accelerates the exact evaluation of large numbers of 3D geometric predicates with an algorithm whose work is partitioned between the CPU and the GPU on a high-performance computer to exploit the relative strengths of each. The test algorithm computes all the red-blue intersections between a set of red 3D triangles and another set of blue 3D triangles. A sequence of filters is employed that progressively eliminates more and more red-blue pairs that do not intersect, finally leaving only the actual intersections. Initially, a uniform grid is constructed on the GPU to identify pairs of nearby triangles. Then, these pairs are tested for intersection with single-precision interval arithmetic on the GPU. The ambiguous cases are next filtered with double-precision interval arithmetic on the multi-core CPU, and finally the hard cases are re-evaluated in parallel on the CPU using arbitrary-precision rational numbers. The parallel speedup for the whole algorithm was up to 414 times. It took only 1.17 s to find the 18M intersections between two datasets containing a total of 14M triangles. The intersection computation was sped up by up to 1936 times. The techniques that gave this excellent performance should be useful for parallelizing other geometric algorithms in fields such as CAD, GIS, and 3D modeling. (C) 2022 Elsevier Ltd. All rights reserved.
We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce by index. We first present derivati...
详细信息
In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popu...
详细信息
In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice due to its simplicity and portability, offering a directive-driven approach for shared-memory parallel programming. Despite its wide adoption, however, there is a lack of comprehensive data on the actual usage of OpenMP constructs, hindering unbiased insights into its popularity and evolution. This paper presents a statistical analysis of OpenMP usage and adoption trends based on a novel and extensive database, HPCORPUS, compiled from GitHub repositories containing C, C++, and Fortran code. The results reveal that OpenMP is the dominant parallel programming model, accounting for 45% of all analyzed parallel APIs. Furthermore, it has demonstrated steady and continuous growth in popularity over the past decade. Analyzing specific OpenMP constructs, the study provides in-depth insights into their usage patterns and preferences across the three languages. Notably, we found that while OpenMP has a strong "common core" of constructs in common usage (while the rest of the API is less used), there are new adoption trends as well, such as simd and target directives for accelerated computing and task for irregular parallelism. Overall, this study sheds light on OpenMP's significance in HPC applications and provides valuable data for researchers and practitioners. It showcases OpenMP's versatility, evolving adoption, and relevance in contemporary parallel programming, underlining its continued role in HPC applications and beyond. These statistical insights are essential for making informed decisions about parallelization strategies and provide a foundation for further advancements in parallel programming models and techniques. HPCORPUS, as well as the analysis scripts and raw results, are available at: https://***/Scientific-Computing-Lab-NRCN/HP
Lately, parallel task models have received much attention in the development of real-time multiprocessor systems, as they allow highly compute-intensive tasks to have shorter deadlines which is very much required in m...
详细信息
Lately, parallel task models have received much attention in the development of real-time multiprocessor systems, as they allow highly compute-intensive tasks to have shorter deadlines which is very much required in modern reactive systems. However, missing modularity and portability can make parallel programming a cumbersome endeavor. As a consequence, compute-intensive sectors in the desktop and server segment have relied on parallelism frameworks such as Intel Threading Building Blocks, Cilk and OpenMP. These parallelism frameworks, however, are optimized for decent average case performance and consequently, do not meet the strict requirements imposed by real-time *** this paper, we present a proof-of-concept parallelism framework which was implemented in particular for soft real-time systems and having tight timing and safety requirements of such critical systems in mind. The proposed runtime system implements static memory allocation in a work-stealing environment that conforms to the strict space and tight probabilistic time bounds of work-stealing schedulers. Furthermore, we evaluate the performance of this framework by conducting multiprogrammed benchmarks on a real-time embedded multicore architecture.
parallel programming remains a daunting challenge, from struggling to express a parallel algorithm without cluttering the underlying synchronous logic to describing which devices to employ to calculate correctness. Ov...
详细信息
High-Performance Computing (HPC) is one of the strategic priorities for research and innovation worldwide due to its relevance for industrial and scientific applications. We envision HPC as composed of three pillars: ...
详细信息
Aiming at the problems of accuracy, speed reduction and estimation accuracy loss caused by MH instead of sequential importance resampling in Metropolis Hastings resampling particle filter algorithm, this paper propose...
详细信息
Aiming at the problems of accuracy, speed reduction and estimation accuracy loss caused by MH instead of sequential importance resampling in Metropolis Hastings resampling particle filter algorithm, this paper proposes a parallel Metropolis hasting filter algorithm based on a multi-prediction framework, which loads particles Filtering shifts from resampling to prediction and update steps. The overhead of the Multi-prediction framework can be easily compensated by parallel implementation. This algorithm reduces global sequential operations by adding local parallel computing. Simulation experiments prove that the real-time performance and state estimation accuracy of this method have been improved.
Hardware Transactional Memory (HTM) is a high-performance instantiation of the powerful programming abstraction of transactional memory, which simplifies the daunting— yet critically important—task of parallel progr...
详细信息
Hardware Transactional Memory (HTM) is a high-performance instantiation of the powerful programming abstraction of transactional memory, which simplifies the daunting— yet critically important—task of parallel programming. While many HTM implementations with variable complexity exist in the literature, commercially available HTMs impose rigid restrictions to transaction and system behavior, limiting their practical use. A key constraint is the limited size of supported transactions, implicitly capped by hardware buffering capacity. We identify the opportunity to expand the effective capacity of these limited hardware structures by being more selective in memory accesses that need to be tracked. We leverage compiler and virtual memory support to identify safe memory accesses, which can never cause a transaction abort, subsequently passed as safety hints to the underlying HTM. With minor extensions over a conventional HTM implementation, HinTM uses these hints to selectively allocate transactional state tracking resources to unsafe accesses only, thus expanding the HTM’s effective capacity, and conversely reducing capacity aborts. We demonstrate that HinTM effectively augments the performance of a range of baseline HTM configurations. When coupled with a POWER8 HTM implementation, HinTM eliminates 64% of transactional capacity aborts, achieving 1.4× average speedup, and up to 8.7×.
This paper proposes a robust encryption strategy for data protection within a Hadoop Distributed File System (HDFS) environment by integrating Advanced Encryption Standard (AES) and MapReduce. Leveraging the speed of ...
详细信息
ISBN:
(数字)9798350315875
ISBN:
(纸本)9798350315882
This paper proposes a robust encryption strategy for data protection within a Hadoop Distributed File System (HDFS) environment by integrating Advanced Encryption Standard (AES) and MapReduce. Leveraging the speed of the AES-128bit encryption algorithm in conjunction with the MapReduce parallel programming paradigm, the method achieves superior efficiency in the encryption of large amounts of crucial data. Furthermore, the implementation utilizes Phil Rogaway's XEX (Xor-Encrypt-Xor) XTS mode, which provides a robust defense against ciphertext manipulation and copy-and-paste attacks. This approach employs parallel mappers and reducers, known as AES-MR, to encrypt data chunks sequentially and concurrently. The paper demonstrates the efficacy and security of this method, suggesting it as a viable safety measure for safeguarding user-generated data in the HDFS context.
Data race is a notorious problem in parallel programming. There has been great research interest in type systems that statically prevent data races. Despite the progress in the safety and usability of these systems, l...
详细信息
暂无评论