parallelNuscaS is an object-oriented package for parallel finite elemt modeling, developed at the Technical University of Czestochowa. This paper is devoted to the investigation of the package performance on the ACCOR...
详细信息
ISBN:
(纸本)3540437924
parallelNuscaS is an object-oriented package for parallel finite elemt modeling, developed at the Technical University of Czestochowa. This paper is devoted to the investigation of the package performance on the ACCORD cluster, which this year was built in the Institute of Mathematics and Computer Science of this University. At present, ACCORD contains 18 Pentium III 750 MHz processors, or 9 SNIP nodes, connected both by the fast MYRINET network and standard Fast Ethernet, as well as 8 SMP nodes with 16 AMD Athlon MP 1.2 GHZ processors. We discuss the implementation and performance of parallel FEM computations not only for the message-passing model of parallel programming, but also for the hybrid model, which is a mixture of multithreading inside SMP nodes and message passing between them.
Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more ...
详细信息
ISBN:
(纸本)9781467376846
Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more parallel programming content into undergraduate computer science education. One obstacle to doing this is that the programming languages most commonly used for parallel programming are detailed, low-level languages such as C, C++, Fortran (with OpenMP or MPI), OpenCL and CUDA. These languages allow programmers to write very efficient code, but that is not so important for those whose goal is to learn the concepts of parallel computing. This paper introduces a parallel programming language called Tetra which provides parallel programming features as first class language features, and also provides garbage collection and is designed to be as simple as possible. Tetra also includes an integrated development environment which is specifically geared for debugging parallel programs and visualizing program execution across multiple threads.
Cyber-physical systems (CPSs) are embedded systems that are tightly integrated with their physical environment. The correctness of a CPS depends on the output of its computations and on the timeliness of completing th...
详细信息
ISBN:
(纸本)9781509035311
Cyber-physical systems (CPSs) are embedded systems that are tightly integrated with their physical environment. The correctness of a CPS depends on the output of its computations and on the timeliness of completing the computations. This paper proposes the ForeC language for the deterministic parallel programming of CPS applications on multi-core execution platforms. ForeC's synchronous semantics is designed to greatly simplify the understanding and debugging of parallel programs. ForeC allows programmers to express many forms of parallel patterns while ensuring that programs are amenable to static timing analysis. One of ForeC's main innovation is its shared variable semantics that provides thread isolation and deterministic thread communication. Through benchmarking, we demonstrate that ForeC can achieve better parallel performance than Esterel, a widely used synchronous language for concurrent safety-critical systems, and OpenMP, a popular desktop solution for parallel programming. We demonstrate that the worst-case execution time of ForeC programs can be estimated precisely.
parallel patterns, views, and spaces are promising abstractions to capture the programmer's intent as well as the contextual information that can be used by an underlying runtime to efficiently map software to par...
详细信息
ISBN:
(纸本)9780738143057
parallel patterns, views, and spaces are promising abstractions to capture the programmer's intent as well as the contextual information that can be used by an underlying runtime to efficiently map software to parallel hardware. These abstractions can be valuable in cases where an algorithm must accommodate requirements of code and performance portability across hardware architectures and vendor programming models. Kokkos is a parallel programming model for host- and accelerator architectures that relies on these abstractions and targets these requirements. It consists of a pure C++ interface, a specification, and a programming library. The programming library exposes patterns and types and maps them to an underlying abstract machine model. The abstract machine model offers a generic view of parallel hardware. While Kokkos is gaining popularity in large-scale HPC applications at some DOE laboratories, we believe that the implemented concepts are of interest to a broader audience including academia as they may contribute to a generic, vendor, and architecture-independent education of parallel programming. In this work, we give an insight into the design considerations of this programming model and list important abstractions. Further, we document best practices obtained from giving virtual classes on Kokkos and give pointers to resources that the reader may consider valuable for a lecture on generic parallel programming for students with preexisting knowledge on this matter.
We present here the results of our investigation of a transactional model of parallel programming on cluster computing systems. This model is specifically targeted for graph applications with the goal of harnessing un...
详细信息
ISBN:
(纸本)9781538619933
We present here the results of our investigation of a transactional model of parallel programming on cluster computing systems. This model is specifically targeted for graph applications with the goal of harnessing unstructured parallelism inherently present in many such problems. In this model, tasks for vertex-centric computations are executed optimistically in parallel as serializable transactions. A key-value based globally shared object store is implemented in the main memory of the cluster nodes for storing the graph data. Task computations read and modify data in the distributed global store, without any explicitly programmed message-passing in the application code. Based on this model we developed a framework for parallel programming of graph applications on computing clusters. We present here the programming abstractions provided by this framework and its architecture. Using several graph problems we illustrate the simplicity of the abstractions provided by this model. These problems include graph coloring, k-nearest neighbors, and single-source shortest path computation. We also illustrate how incremental computations can be supported by this programming model. Using these problems we evaluate the transactional programming model and the mechanisms provided by this framework.
Developing performant parallel applications for the distributed environment is challenging and requires expertise in both the HPC system and the application domain. We have developed a C++-based framework called APPFI...
详细信息
ISBN:
(数字)9781665488020
ISBN:
(纸本)9781665488020
Developing performant parallel applications for the distributed environment is challenging and requires expertise in both the HPC system and the application domain. We have developed a C++-based framework called APPFIS that hides the system complexities by providing an easy-to-use interface for developing performance portable structured grid-based stencil applications. APPFIS's user interface is hardware agnostic and provides partitioning, code optimization, and automatic communication for stencil applications in distributed HPC environment. In addition, it offers straightforward APIs for utilizing multiple GPU accelerators, shared memory, and node-level parallelizations with automatic optimization for computation and communication overlapping. We have tested the functionality and performance of APPFIS using several applications on three platforms (Stampede2 at Texas Advanced Computing Center, Bridges-2 at Pittsburgh Supercomputing Center, and Summit Supercomputer at Oak Ridge National Laboratory). Experimental results show comparable performance to hand-tuned code with an excellent strong and weak scalability up to 4096 CPUs and 384 GPUs.
We present a small, extensible software interface for the communication between different parallel programming models. With only four new commands our PLUS communication interface can be easily integrated into existin...
详细信息
ISBN:
(纸本)3540628983
We present a small, extensible software interface for the communication between different parallel programming models. With only four new commands our PLUS communication interface can be easily integrated into existing parallel codes, allowing tasks to transparently communicate from, e.g., PVM to MPI and PARIX, or any other parallel programming model. PLUS is one important software module that has been developed within the Metacomputer Online initiative. The core idea of Metacomputer Online is to design small, versatile and extensible interfaces between existing software modules with the goal to build a WAN metacomputer by linking suitable existing software packages. Our current PLUS implementation supports inter-process communication between PVM, MPI and PARIX. Much effort has been spent on optimizing the communication across internet and intranet links. As a result, our PLUS communication is usually faster than raw stream socket TCP/IP communication supported by the various parallel programming models.
We present an integrated environment for the systematic development of parallel and distributed programs. Our approach allows the user to construct complex applications by composing and transforming skeletons, i.e., r...
详细信息
ISBN:
(纸本)3540663630
We present an integrated environment for the systematic development of parallel and distributed programs. Our approach allows the user to construct complex applications by composing and transforming skeletons, i.e., recurring patterns of task and data parallelism. First academic and commercial experience with skeleton-based systems has demonstrated the benefits of the approach but also the lack of a dedicated set of methods for algorithm design and performance prediction. We take a first step towards such a set of methods by proposing an environment which integrates a framework for algorithm transformation, called FAN, with two existing skeleton-based programming systems: the academic system P3L and its commercial counterpart SkIE.
With the advent of large-scale heterogeneous platforms such as clusters and grids, resource failures are more likely to occur and have an adverse effect on the applications. Consequently, there is an increasing need f...
详细信息
ISBN:
(纸本)9783642141218
With the advent of large-scale heterogeneous platforms such as clusters and grids, resource failures are more likely to occur and have an adverse effect on the applications. Consequently, there is an increasing need for developing techniques to achieve reliability during execution. This paper presents FT-Jace, a new reliable programming model for grid computing environments. FT-JACE achieves reliability in a transparent manner for the programmer. It is based on active replication scheme, capable of supporting r arbitrary fail-silent (a faulty node does not produce any output) and fail-stop (no node recovery) node failures. The strength of our programming environment is that the deployment of the application does not require complicated mechanisms for failure detection. More precisely, node failures are masked and there is no need for detecting and handling such failures. We provide experimental results conducted on Grid'5000(1) platform to demonstrate the usefulness of FT-Jace.
This contribution presents a computational framework for simulation and gradient-based structural optimization of geometrically nonlinear and large-scale structural finite element models. CAGD-free optimization method...
详细信息
ISBN:
(纸本)9781905088416
This contribution presents a computational framework for simulation and gradient-based structural optimization of geometrically nonlinear and large-scale structural finite element models. CAGD-free optimization methods have been developed to integrate shape optimization in an early stage of design and to reduce the related modelling effort. To overcome the problem of an increasing numerical cost due to the large design space, the design sensitivities for objectives and constraints are evaluated via adjoint formulations. A new parallel computation strategy for sensitivity evaluation is presented which takes advantage of a completely parallelized simulation and optimization environment. Two application examples illustrate the method and demonstrate the high parallel efficiency.
暂无评论