We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce by index. We first present derivati...
详细信息
In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popu...
详细信息
In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice due to its simplicity and portability, offering a directive-driven approach for shared-memory parallel programming. Despite its wide adoption, however, there is a lack of comprehensive data on the actual usage of OpenMP constructs, hindering unbiased insights into its popularity and evolution. This paper presents a statistical analysis of OpenMP usage and adoption trends based on a novel and extensive database, HPCORPUS, compiled from GitHub repositories containing C, C++, and Fortran code. The results reveal that OpenMP is the dominant parallel programming model, accounting for 45% of all analyzed parallel APIs. Furthermore, it has demonstrated steady and continuous growth in popularity over the past decade. Analyzing specific OpenMP constructs, the study provides in-depth insights into their usage patterns and preferences across the three languages. Notably, we found that while OpenMP has a strong "common core" of constructs in common usage (while the rest of the API is less used), there are new adoption trends as well, such as simd and target directives for accelerated computing and task for irregular parallelism. Overall, this study sheds light on OpenMP's significance in HPC applications and provides valuable data for researchers and practitioners. It showcases OpenMP's versatility, evolving adoption, and relevance in contemporary parallel programming, underlining its continued role in HPC applications and beyond. These statistical insights are essential for making informed decisions about parallelization strategies and provide a foundation for further advancements in parallel programming models and techniques. HPCORPUS, as well as the analysis scripts and raw results, are available at: https://***/Scientific-Computing-Lab-NRCN/HP
Lately, parallel task models have received much attention in the development of real-time multiprocessor systems, as they allow highly compute-intensive tasks to have shorter deadlines which is very much required in m...
详细信息
Lately, parallel task models have received much attention in the development of real-time multiprocessor systems, as they allow highly compute-intensive tasks to have shorter deadlines which is very much required in modern reactive systems. However, missing modularity and portability can make parallel programming a cumbersome endeavor. As a consequence, compute-intensive sectors in the desktop and server segment have relied on parallelism frameworks such as Intel Threading Building Blocks, Cilk and OpenMP. These parallelism frameworks, however, are optimized for decent average case performance and consequently, do not meet the strict requirements imposed by real-time *** this paper, we present a proof-of-concept parallelism framework which was implemented in particular for soft real-time systems and having tight timing and safety requirements of such critical systems in mind. The proposed runtime system implements static memory allocation in a work-stealing environment that conforms to the strict space and tight probabilistic time bounds of work-stealing schedulers. Furthermore, we evaluate the performance of this framework by conducting multiprogrammed benchmarks on a real-time embedded multicore architecture.
parallel programming remains a daunting challenge, from struggling to express a parallel algorithm without cluttering the underlying synchronous logic to describing which devices to employ to calculate correctness. Ov...
详细信息
High-Performance Computing (HPC) is one of the strategic priorities for research and innovation worldwide due to its relevance for industrial and scientific applications. We envision HPC as composed of three pillars: ...
详细信息
Aiming at the problems of accuracy, speed reduction and estimation accuracy loss caused by MH instead of sequential importance resampling in Metropolis Hastings resampling particle filter algorithm, this paper propose...
详细信息
Aiming at the problems of accuracy, speed reduction and estimation accuracy loss caused by MH instead of sequential importance resampling in Metropolis Hastings resampling particle filter algorithm, this paper proposes a parallel Metropolis hasting filter algorithm based on a multi-prediction framework, which loads particles Filtering shifts from resampling to prediction and update steps. The overhead of the Multi-prediction framework can be easily compensated by parallel implementation. This algorithm reduces global sequential operations by adding local parallel computing. Simulation experiments prove that the real-time performance and state estimation accuracy of this method have been improved.
Hardware Transactional Memory (HTM) is a high-performance instantiation of the powerful programming abstraction of transactional memory, which simplifies the daunting— yet critically important—task of parallel progr...
详细信息
Hardware Transactional Memory (HTM) is a high-performance instantiation of the powerful programming abstraction of transactional memory, which simplifies the daunting— yet critically important—task of parallel programming. While many HTM implementations with variable complexity exist in the literature, commercially available HTMs impose rigid restrictions to transaction and system behavior, limiting their practical use. A key constraint is the limited size of supported transactions, implicitly capped by hardware buffering capacity. We identify the opportunity to expand the effective capacity of these limited hardware structures by being more selective in memory accesses that need to be tracked. We leverage compiler and virtual memory support to identify safe memory accesses, which can never cause a transaction abort, subsequently passed as safety hints to the underlying HTM. With minor extensions over a conventional HTM implementation, HinTM uses these hints to selectively allocate transactional state tracking resources to unsafe accesses only, thus expanding the HTM’s effective capacity, and conversely reducing capacity aborts. We demonstrate that HinTM effectively augments the performance of a range of baseline HTM configurations. When coupled with a POWER8 HTM implementation, HinTM eliminates 64% of transactional capacity aborts, achieving 1.4× average speedup, and up to 8.7×.
This paper proposes a robust encryption strategy for data protection within a Hadoop Distributed File System (HDFS) environment by integrating Advanced Encryption Standard (AES) and MapReduce. Leveraging the speed of ...
详细信息
ISBN:
(数字)9798350315875
ISBN:
(纸本)9798350315882
This paper proposes a robust encryption strategy for data protection within a Hadoop Distributed File System (HDFS) environment by integrating Advanced Encryption Standard (AES) and MapReduce. Leveraging the speed of the AES-128bit encryption algorithm in conjunction with the MapReduce parallel programming paradigm, the method achieves superior efficiency in the encryption of large amounts of crucial data. Furthermore, the implementation utilizes Phil Rogaway's XEX (Xor-Encrypt-Xor) XTS mode, which provides a robust defense against ciphertext manipulation and copy-and-paste attacks. This approach employs parallel mappers and reducers, known as AES-MR, to encrypt data chunks sequentially and concurrently. The paper demonstrates the efficacy and security of this method, suggesting it as a viable safety measure for safeguarding user-generated data in the HDFS context.
Data race is a notorious problem in parallel programming. There has been great research interest in type systems that statically prevent data races. Despite the progress in the safety and usability of these systems, l...
详细信息
Much progress has been made on integrating parallel programming into the core Computer Science curriculum of top-tier universities in the United States. For example, "COMP 322: Introduction to parallel Programmin...
详细信息
ISBN:
(纸本)9781728101903
Much progress has been made on integrating parallel programming into the core Computer Science curriculum of top-tier universities in the United States. For example, "COMP 322: Introduction to parallel programming" at Rice University is a required course for all undergraduate students pursuing a bachelors degree. It teaches a wide range of parallel programming paradigms, from task-parallel to SPMD to actor-based programming. However, courses like COMP 322 do little to support members of the Computer Science community that need to develop these skills but who are not currently enrolled in a four-year program with parallel programming in the curriculum. This group includes (1) working professionals, (2) students at USA universities without parallel programming courses, or (3) students in countries other than the USA without access to a parallel programming course. To serve these groups, Rice University launched the "parallel, Concurrent, and Distributed programming in Java" Coursera specialization on July 31, 2017. In 2017, the authors of that specialization also wrote an experiences paper about launching the specialization. In this paper, the sequel to our previous publication, we look back at the first year of the Coursera specialization. In particular, we ask the following questions: (1) how did our assumptions about the student body for this course hold up?, (2) how has the course changed since launch?, and (3) what can we learn about how students are progressing through the specialization from Coursera's built-in analytics?
暂无评论