MPI (Message Passing Interface) and OpenMP are two tools broadly used to develop parallel programs. On the one hand, MPI has the advantage of high performance while being difficult to use. On the other hand, OpenMP is...
详细信息
ISBN:
(纸本)9781509027729
MPI (Message Passing Interface) and OpenMP are two tools broadly used to develop parallel programs. On the one hand, MPI has the advantage of high performance while being difficult to use. On the other hand, OpenMP is very easy to use but is restricted to shared-memory architectures. CAPE is an approach based on checkpoints to allow the execution of OpenMP programs on distributed-memory architectures. This paper aims at presenting both an in-depth analysis and an evaluation of the performance of CAPE by comparing the execution model of both CAPE and MPI. Some suggestions are also provided to improve the use of CAPE.
Using the Message Passing Interface (MPI) in C++ has been difficult up to this point, because of the lack of suitable C++ bindings and C++ class libraries. The existing MPI standard provides language bindings only for...
详细信息
Using the Message Passing Interface (MPI) in C++ has been difficult up to this point, because of the lack of suitable C++ bindings and C++ class libraries. The existing MPI standard provides language bindings only for C and Fortran 77, precluding their direct use in object-oriented programming. Even the proposed C++ bindings in MPI-2 are at a fairly low-level and are not directly suitable for object-oriented programming. In this paper, we present the requirements, analysis and design for Object-Oriented MPI (OOMPI), a C++ class library for MPI. Although the OOMPI class library is specified in C++, in some sense the specification is a generic one that uses C++ as the program description language. Thus, the OOMPI specification can also be considered as a generic object-oriented class library specification which can thus also form the basis for MPI class libraries in other object-oriented languages.
This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for implementing thread-local data. But in f...
详细信息
ISBN:
(纸本)9781424400546
This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for implementing thread-local data. But in fact the ELF binary specification currently supports data sections that become threadprivate by default. ELF naming for such areas is thread-local storage (TLS). Our experiments demonstrate that implementing threadprivate based on the TLS support is very easy, and more efficient. This proposal goes in the same line as the future implementation of OpenMP on the GNU compiler collection. In addition, our experience with the use of threadprivate in OpenMP applications shows that usually it is better to avoid it. This is because threadprivate variables reside in common blocks and they impede the compiler to fully optimize the code. So it is better to keep threadprivate as a temporary technique only to ease porting MPI codes to OpenMP
Nowadays, most embedded devices need to support multiple applications running concurrently. In contrast to desktop computing, very often the set of applications is known at design time and the designer needs to assure...
详细信息
ISBN:
(纸本)9783981080162
Nowadays, most embedded devices need to support multiple applications running concurrently. In contrast to desktop computing, very often the set of applications is known at design time and the designer needs to assure that critical applications meet their constraints in every possible use-case. In order to do this, all possible use-cases, i.e. subset of applications running simultaneously, have to be verified thoroughly. An approach to reduce the verification effort, is to perform composability analysis which has been studied for sets of applications modeled as Synchronous Dataflow Graphs. In this paper we introduce a framework that supports a more general parallel programming model based on the Kahn Process Networks Model of Computation and integrates a complete MPSoC programming environment that includes: compiler-centric analysis, performance estimation, simulation as well as mapping and scheduling of multiple applications. In our solution, composability analysis is performed on parallel traces obtained by instrumenting the application code. A case study performed on three typical embedded applications, JPEG, GSM and MPEG-2, proved the applicability of our approach.
In this paper, the performance of parallel computing will be thoroughly discussed in the domain of image matching. The concept of image matching is widely used in the areas of security, medical and computer vision whi...
详细信息
In this paper, the performance of parallel computing will be thoroughly discussed in the domain of image matching. The concept of image matching is widely used in the areas of security, medical and computer vision which require comparing two images for similarities. However, depending on the size of images, it is highly possible that the application computation cannot be handled in a single processor running a sequential algorithm. In order to overcome this limitation, parallel computing is introduced through the Message Passing Interface (MPI) library. In this project, for the comparison of two images, both images are first converted into grayscale and then are compared using the Sum of Square Differences (SSD) algorithm. Further, a parallel network of 12 processors was implemented for image matching and to calculate the performance of the SSD algorithm between both images. The performance gain of 12, 8, 4 and 2 processors was compared with the performance of a single processor. The comparison results presented a linear relationship between the performance gain and the number of processors used for execution. Hence, it proves that there are significant benefits of parallelism on SSD applications.
In this paper we examine how a network processor can be modeled using object-oriented techniques. We examine the Intel IXP 1200 network processor and discuss how the object-oriented language POOSL was utilized to allo...
详细信息
In this paper we examine how a network processor can be modeled using object-oriented techniques. We examine the Intel IXP 1200 network processor and discuss how the object-oriented language POOSL was utilized to allow an evaluation of a system before implementing it with hardware and software components. With the case study of the IXP 1200, we illustrate the suitability of object-oriented languages for system level modeling and design exploration.
The machine model considered in this paper is that of a distributed memory parallel processor (DMPP) with a two-dimensional torus topology. Within this framework, the authors study the relationship between the speedup...
详细信息
The machine model considered in this paper is that of a distributed memory parallel processor (DMPP) with a two-dimensional torus topology. Within this framework, the authors study the relationship between the speedup delivered by compiler-parallelized code and the machine's interprocessor communication speed. It is shown that compiler-parallelized code often exhibits more interprocessor communication than manually parallelized code and that the performance of the former is therefore more sensitive to the machine's interprocessor communication speed. Because of this, a parallelizing compiler developed for a platform not explicitly designed to sustain the increased interprocessor communication will produce-in the general case-code that delivers disappointing speedups. Finally, the study provides the point of diminishing return for the interprocessor communication speed beyond which the DMPP designer should focus on improving other architectural parameters, such as the local memory-processor bandwidth.< >
High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientif...
详细信息
High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientific software based on the algorithms is very much under par. The paper analyses main issues encountered by scientific programmers during implementation of heterogeneous parallel algorithms in a portable form. It explains how programming systems can address the issues in order to maximally facilitate implementation of parallel algorithms for heterogeneous platforms and outlines two existing programming systems for high performance heterogeneous computing, mpC and HeteroMPI
A precise characterization for problems to be parallelized by the divide-and-conquer method is obtained. A parallel algorithm for such problems, which is likely to cluster data such that a substantial amount of comput...
详细信息
A precise characterization for problems to be parallelized by the divide-and-conquer method is obtained. A parallel algorithm for such problems, which is likely to cluster data such that a substantial amount of computation is carried out within sites and communication cost is manageable, is developed. The characterization turns out to be very simple, yet general enough to cover concrete problems in many disciplines, such as sorting, computing the transitive closure of a binary relation, and computing a minimal cover in decomposing relations into 3NF forms. A linear recursive program is given to describe problems to be parallelized.< >
The author reports on an expert system, PrIAM, that supports programming for distributed memory massively parallel computers using graphical visualization as a centerpiece. The system provides mechanisms for the conci...
详细信息
The author reports on an expert system, PrIAM, that supports programming for distributed memory massively parallel computers using graphical visualization as a centerpiece. The system provides mechanisms for the concise formal specification of families of numerical algorithms, their visualization as annotated graphs, a powerful set of formal (semantic and syntactic) transformations to tailor the specification to suit distributed memory multiprocessors, and dynamic graphical animation of the behavior of message-passing parallel programs with graphical summaries of their performance. For the most part, PrIAM is a straightforward expert system with three components: the user interface, the database, and the rule base. As much of PrIAM as possible has been pushed into the user interface, yielding a system that is easy to use and maintain. PrIAM currently serves two purposes: as a tutoring system to support lecture courses in parallel programming, and as a part of a programming environment for transputer-based multiprocessor systems.< >
暂无评论