This paper describes a parallel solver framework focused on flow and geomechanics reservoir simulation applications. It has been designed to run efficiently on a wide range of target platforms, from desktop workstatio...
详细信息
ISBN:
(纸本)9781538655559
This paper describes a parallel solver framework focused on flow and geomechanics reservoir simulation applications. It has been designed to run efficiently on a wide range of target platforms, from desktop workstations to heterogeneous clusters of multicore nodes, with or without GPUs, using a framework for distributed matrices and vectors based on a two-tier hierarchical architecture. Results show good parallel scalability on clusters of multicore nodes. Comparisons with the PETSc library indicate it is competitive with the best available tools. Preliminary tests indicate good speedups and parallel scalability also on multiple GPUs.
Task parallel programming models such as Habanero Java help developers write idiomatic parallel programs and avoid common errors. Data race freedom is a desirable property for task parallel programs but is difficult t...
详细信息
ISBN:
(纸本)9780983567899
Task parallel programming models such as Habanero Java help developers write idiomatic parallel programs and avoid common errors. Data race freedom is a desirable property for task parallel programs but is difficult to prove because every possible execution of the program must be considered. A partial order over events of an observed program execution induces an equivalence class of executions that the program may also produce. The Does-not-Commute (DC) relation is an efficiently computable partial order used for data race detection. As a relatively weak partial order, the DC relation can represent relatively large equivalence classes of program executions. However, some of these executions may be infeasible, thus leading to false data race reports. The contribution of this paper is a mechanized proof that the DC relation is actually sound for commonly used task parallel programming models. Sound means that the first data race identified by the DC relation is guaranteed to be a real data race. A prototype analysis in the Java Pathfinder model checker shows that the DC relation can significantly reduce the number of explored states required to prove data race freedom in Habanero Java programs. In this application, the search for data race using the DC relation is both sound and complete.
With the increase of the search for computational models where the expression of parallelism occurs naturally, some paradigms arise as options for the current generation of computers. In this context, dynamic Dataflow...
详细信息
ISBN:
(纸本)9781538655559
With the increase of the search for computational models where the expression of parallelism occurs naturally, some paradigms arise as options for the current generation of computers. In this context, dynamic Dataflow and Gamma General Abstract Model for Multiset mAnipulation - emerge as interesting computational model choices. In dynamic Dataflow model, operations are performed as soon as their associated operands are available, without rely on a Program Counter to dictate the execution order of instructions. The Gamma paradigm is based on a parallel multiset rewriting scheme. It provides a nondeterministic execution model inspired by an abstract chemical machine metaphor, where operations are formulated as reactions that occur freely among matching elements belonging to the multiset. In this work, equivalence relations between the dynamic Dataflow and Gamma paradigms are exposed and explored, while methods to convert from Dataflow to Gamma paradigm and vice-versa are provided. It is shown that vertices and edges of a dynamic Dataflow graph can correspond, respectively, to reactions and multiset elements in the Gamma paradigm. This work provides the scientific community with the possibility of taking profit of both parallel programming models, contributing with a versatility component to researchers and developers. Finally, to the best of our knowledge, the similarity relations between both dynamic Dataflow and Gamma models presented have not been reported in any previous work.
Current report consider development of a unified MC-based simulation platform for needs of Biomedical Optics and its practical use in the creation of novel Optical Diagnostics, Imaging and Sensing modalities aided by ...
详细信息
ISBN:
(纸本)9781510626300
Current report consider development of a unified MC-based simulation platform for needs of Biomedical Optics and its practical use in the creation of novel Optical Diagnostics, Imaging and Sensing modalities aided by the Artificial Intelligence (AI) methods. It will be demonstrated how the developed MC platform can be utilized in the generation of validated lookup tables/labeled data sets and subsequent training of several configurations of Artificial Intelligence (AI) based methods for the purpose of real-time estimation of certain specific tissue properties of interest such as distributions of melanin, blood, oxygenation, etc. The prototypes of lightweight AI-empowered sensing solutions that could potentially be shrank onto a smartphone/wearable device form-factor will be presented and their performance will be compared with traditional spectroscopy-based methods using phantom and in vivo experimental data obtained during clinical studies.
Nowadays, any personal computer includes a GPU that allows to use its parallelism to speedup computations. Unfortunately, it is not a trivial task to take advantage of such parallel architectures. In this paper, we pr...
详细信息
ISBN:
(纸本)9781728145693
Nowadays, any personal computer includes a GPU that allows to use its parallelism to speedup computations. Unfortunately, it is not a trivial task to take advantage of such parallel architectures. In this paper, we present a library of swarm intelligence metaheuristics providing automatic parallelizations. The library is freely available, it is implemented by using CUDA, and it includes parallel versions of metaheuristics for both continuous and discrete domains. We show the basic ideas of the parallelization of one of the metaheuristics, and we prove its usefulness by presenting empirical results.
Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. ...
详细信息
ISBN:
(纸本)9781450376389
Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads.
Many real-world applications feature data accesses on periodic domains. Manually implementing the synchronizations and communications associated to the data dependences on each case is cumbersome and error-prone. It i...
详细信息
Many real-world applications feature data accesses on periodic domains. Manually implementing the synchronizations and communications associated to the data dependences on each case is cumbersome and error-prone. It is increasingly interesting to support these applications in high-level parallel programming languages or parallelizing compilers. In this paper, we present a technique that, for distributed-memory systems, calculates the specific communications derived from data-parallel codes with or without periodic boundary conditions on affine access expressions. It makes transparent to the programmer the management of aggregated communications for the chosen data partition. Our technique moves to runtime part of the compile-time analysis typically used to generate the communication code for affine expressions, introducing a complete new technique that also supports the periodic boundary conditions. We present an experimental study to evaluate our proposal using several study cases. Our experimental results show that our approach can automatically obtain communication codes as efficient as those found in MPI reference codes, reducing the development effort.
The main objective of the study is to develop a reactive-transport model able to utilize HPC resources. The primary purpose is constructing a mathematical representation of a proposed reactive-transport system in orde...
详细信息
ISBN:
(纸本)9781450372411
The main objective of the study is to develop a reactive-transport model able to utilize HPC resources. The primary purpose is constructing a mathematical representation of a proposed reactive-transport system in order to simulate the potential risk of environmental contamination. Additionally, the contribution of the study is not only associated with HPC usage but also with new model features implemented during the developing phase. Overall, the Transport-Reaction Model (TRM) was developed to include complex functionality that is necessary in order to solve specific transport-reaction issues. TRM is based on coupling the PhreeqcRM geochemical library with 2D solute species transport in water on a regular rectangular network of elements. Compared to the other similar models, our model offers a unique feature that is associated with the 2D mesh. This feature represents an innovative component that improved our modelling results. Testing revealed that TRM provides conditions for simulation acceleration up to 16 threads. The further addition of resources to 20 or 24 also speeds up the calculation but decreases the efficiency of the parallel solution. Generally, the TRM is optimally run on 16 threads.
Nowadays development of venous distributed STMS, which aid parallel programming of distributed systems, attracts interest of many researchers. In this paper, we developed the Python distributed STM based on data repli...
详细信息
ISBN:
(纸本)9781728147895
Nowadays development of venous distributed STMS, which aid parallel programming of distributed systems, attracts interest of many researchers. In this paper, we developed the Python distributed STM based on data replication, which provides better performance as well as tolerance to replica faults. The solution supports both eventual and sequential data consistency. Experimental results show that reading t-variables from a local replica is up to 16 times faster than reading them from the base replica.
This paper outlines a research and development program to enhance modern compiler technology, and the LLVM compiler infrastructure specifically, to directly optimize parallel-programming-model constructs. The goal is ...
详细信息
ISBN:
(数字)9783030346270
ISBN:
(纸本)9783030346270;9783030346263
This paper outlines a research and development program to enhance modern compiler technology, and the LLVM compiler infrastructure specifically, to directly optimize parallel-programming-model constructs. The goal is to produce higher-quality code, and moreover, to remove abstraction penalties generally associated with such constructs. We believe that such abstraction penalties are increasing in importance due to C++ parallel-algorithms libraries and other performance-portability-motivated programming methods. In addition, we will discuss when, and more importantly when not, explicit parallelism-awareness is necessary within the compiler in order to enable the desired optimization capabilities.
暂无评论